S-1 Supercomputer (1975-1988)

Mark Smotherman
Last update: April 2013

... under construction ...
corrections are welcome

... at Livermore Lowell [Wood] asked the graduates students [Tom McWilliams and Curt Widdoes] to team up and give some thought to designing a supercomputer from scratch.

-- William Broad, Star Warriors, p. 32

Summary: The S-1 project was an attempt to build a family of multiprocessor supercomputers. The project was envisioned by Lowell Wood at the Lawrence Livermore National Lab in 1975 and staffed for the first three years by two Stanford University Computer Science graduate students, Tom McWilliams and Curt Widdoes.

That two graduate students could design and almost completely build a supercomputer by themselves is an amazing feat, comparable to the design and building of the CDC 6600 by Seymour Cray and a small staff a dozen years earlier. However, McWilliams and Widdoes are even better known for the major advances in CAD tools for logic design that they developed as part of the early days of the project and for the startup company they founded, Valid Logic Systems.

The project was supported by the US Navy and ramped up in 1978 with the addition of more students, including Mike Farmwald and Jeff Rubin, and again in 1979. Dr. Carl Haussman provided the day-to-day oversight as the project team grew in size.

Five generations of S-1 processors were planned, and two MSI/ECL generations were built. The project independently invented two-bit branch prediction, directory-based cache coherency, and multiprocessor synchronization using load linked and store conditional. The project also influenced the development of programming languages and compilers including Common LISP and gcc.

S1 article
LLNL Newsline, January 10, 1979
(Courtesy LLNL)

Introduction

Dr. Lowell Wood, a physicist at LLNL and protege of Edward Teller, led the special studies group at LLNL, which was called the O-Group. The O-Group members had many interests, but their work mainly revolved around ideas for a national missile defense. Wood was also an interviewer for the Hertz Foundation, which awarded prestigious scholarships to graduate students interested in the applied sciences. From this position, Wood could occasionally recruit top students to work in the summers at the lab.

Two of the Hertz Foundation scholarship recipients that Wood recruited were Curt Widdoes (in 1973) and Tom McWilliams (in 1975). Widdoes had enrolled in the Ph.D. program in computer science at Stanford and was working on the design of the Minerva multiprocessor system in 1975, when McWilliams started his summer job at the lab. Wood encouraged McWilliams to meet Widdoes and challenged them to design and build a supercomputer.

In fact, Wood envisioned a family of multiprocessor supercomputers, with each having nodes of comparable power of contemporary commercial supercomputers. The plan was to build five generations of processors with the same general architecture and to develop computer-aided logic design tools that would ease the task of reimplementing the processors in each new logic technology family. The fifth generation was planned to use wafer-scale integration (WSI).

With the support of the US Navy, two MSI generations of processors were built (but were not strictly compatible):

  1. Mark I (1978)

  2. Mark IIA (1982)

The third-generation processor design was called the AAP (Advanced Architecture Processor). There were also a few references to it as the Mark IIB. The AAP was a RISC-like redesign. The processor had a 32-bit orientation, but each processor register and memory word was augmented with four tag bits for better support of Lisp, Prolog, and garbage collection. The AAP retained the cache-coherent shared-memory multiprocessor programming model of the Mark IIA, but it used dual counter-rotating slotted rings as the global interconnection network. This interconnection was designed to scale to 256 processors. Each AAP processor was supposed to be the size of "a microwave oven".

Project Timeline

(separate page)

CAD tools

... tbd ...

SCALD - structured computer-aided logic design

SCALD I components

  1. SUDS (Stanford University Drawing System) schematic editor
  2. macro expander - Tom McWilliams, 8K lines of code initially
  3. router - wire lister, Curt Widdoes, 12K lines of code initially
(SCALD I ended up with approx. 30K lines of Pascal code)

produced wire wrap list
could also produce change list (wrap/unwrap) for updating an old board

SCALD II

  1. packager - Curt Widdoes, 30K lines of code initially
  2. timing verifier - Tom McWilliams, 6K lines of code initially
  3. (later) automated placement of chips - Jeff Rubin

led to Valid Logic as startup (see below - which needs to be incorporated here)

Instruction set architecture

... preliminary ...

influenced by PDP-10
36 bit word chosen for addressing

inst. set was widely peer-reviewed; over 100 people from Stanford, MIT, and CMU reviewed it, including Forest Baskett, who was McWilliams' and Widdoes' adviser

Branch prediction

Two bits were used - a prediction bit ("jump bit") and a dynamic reverse bit ("wrong bit"). The dynamic reverse bit was set whenever the branch was mispredicted; two mispredicts in a row caused the jump bit to toggle. This scheme has a state diagram just like the two-bit predictor given in Hennessy and Patterson but with the strongly-taken state at the top left having the bit pattern of 10 (rather than 11) and weakly-taken having 11 (rather than 10). A.J. Smith cites an unpublished memo by Widdoes at Stanford in February 1977 that outlines the scheme; Widdoes says it was a joint invention between himself and McWilliams.

(The 2-bit counter scheme was independently invented by Jim Smith at CDC in 1979-1980.)

CISCy instructions like min, max, and qpart were added to Mark IIA to handle situations where branches were unpredictable.
[see Farmwald's dissertation]
[tie in design pressure to deal with branches and the Los Alamos and IBM experience with poor performance with branch prediction on the Stretch; could compare this to recent trend to use predication as the response to unpredictable branches]

I/O structure

"I also don't see that you've mentioned the I/O architecture. I/O was accomplished with the assistance of I/O processors that communicated with the S-1 through I/O memories. There were special instructions to assist in mapping the 9-bit quarterwords, 18-bit halfwords, 36-bit singlewords, and 72-bit doublewords into multiples of 8-bit bytes, possibly with different endianness, There were Unibus and Qbus interfaces to the IOM. We had a PDP-11 as one I/O processor, but the production IOP for both Unix and Amber was a Q-bus-based 68010 system. The same IOP code supported both operating systems; it provided console I/O, mass storage, and networking."

Multiprocessor structure

... tbd ...

16 processors interconnected to 16 memory modules by a crossbar
- influenced by C.mmp
- originally intended to implement software-based cache coherency
- later implemented a directory-based cache coherency scheme (which was also independently invented by Censier and Feaurier)
- the central directory scheme used a 17-bit vector (16 presence and one dirty bit -- the same approach was later used in DASH)
- coherency state transitions used the inter-processor interrupt bus to signal nodes to invalidate
- I/O device were attached to specific processors

moved from RMW approach to LL/SC (Jensen, Hagensen, and Broughton)

AAP had ring interconnection

Compilers, interpreters, and software tools

... tbd ...

emulator and optimizing assembler - Jeff Rubin

Pastel - Jeff Broughton

The early OS work was done in PL/I, but the team later switched to their own systems-programming version of Pascal, which they called "Pastel".

Lisp - Richard Gabriel and Rod Brooks

"S-1 Lisp, never completely functional, was the test bed for adapting advanced compiler techniques to Lisp implementation."

"One implementation of Common Lisp, namely S-1 Lisp, already has a compiler that produces code for numerical computations that is competitive in execution speed to that produced by a Fortran compiler."

later project to port C and Unix, involved Richard Stallman doing the C front-end

"Hoping to avoid the need to write the whole compiler myself, I obtained the source code for the Pastel compiler, which was a multi-platform compiler developed at Lawrence Livermore Lab. It supported, and was written in, an extended version of Pascal, designed to be a system-programming language. I added a C frontend, and began porting it to the Motorola 68000 computer. But I had to give that up when I discovered that the compiler needed many megabytes of stack space, and the available 68000 Unix system would only allow 64K. I then determined that the Pastel compiler was designed to parse the entire input file into a syntax tree, convert the whole syntax tree into a chain of "instructions," and then generate the whole output file, without ever freeing any storage. At this point, I concluded I would have to write a new compiler from scratch. That new compiler is now known as GCC; none of the Pastel compiler is used in it, but I managed to adapt and use the C frontend that I had written. But that was some years later; first, I worked on GNU Emacs."

-- Richard Stallman, The GNU Operating System and the Free Software Movement

"I didn't really know much about optimizing compilers at the time, because I'd never worked on one. But I got my hands on a compiler, that I was told at the time was free. It was a compiler called PASTEL, which the authors say means ``off-color PASCAL''.

"Pastel was a very complicated language including features such as parametrized types and explicit type parameters and many complicated things. The compiler was of course written in this language, and had many complicated features to optimize the use of these things. For example: the type ``string'' in that language was a parameterized type; you could say ``string(n)'' if you wanted a string of a particular length; you could also just say ``string'', and the parameter would be determined from the context. Now, strings are very important, and it is necessary for a lot of constructs that use them to run fast, and this means that they had to have a lot of features to detect such things as: when the declared length of a string is an argument that is known to be constant throughout the function, to save to save the value and optimize the code they're going to produce, many complicated things. But I did get to see in this compiler how to do automatic register allocation, and some ideas about how to handle different sorts of machines.

"Well, since this compiler already compiled PASTEL, what I needed to do was add a front-end for C, which I did, and add a back-end for the 68000 which I expected to be my first target machine. But I ran into a serious problem. Because the PASTEL language was defined not to require you to declare something before you used it, the declarations and uses could be in any order, in other words: Pascal's ``forward'' declaration was obsolete, because of this it was necessary to read in an entire program, and keep it in core, and then process it all at once. The result was that the intermediate storage used in the compiler, the size of the memory needed, was proportional to the size of your file. And this also included stack-space, you needed gigantic amounts of stack space, and what I found as a result was: that the 68000 system available to me could not run the compiler. Because it was a horrible version of Unix that gave you a limit of something like 16K words of stack, this despite the existence of six megabytes in the machine, you could only have 16Kw of stack or something like that. And of course to generate its conflict matrix to see which temporary values conflicted, or was alive at the same time as which others, it needed a quadratic matrix of bits, and that for large functions that would get it to hundreds of thousands of bytes. So i managed to debug the first pass of the ten or so passes of the compiler, cross compiled on to that machine, and then found that the second one could never run.

... "The new C compiler is something that I've written this year since last spring. I finally decided that I'd have to throw out PASTEL. This C compiler uses some ideas taken from PASTEL, and some ideas taken from the University of Arizona Portable Optimizer."

-- Stallman lecture at KTH (Stockholm, Sweden), October 1986

An automatic parallelization tool was under development called the Paralyzer (not to be confused with the Illiac IV parallelization tool of the same name).

also Fred Chow, ...

"The first assembler for the AAP was written in Lisp. (The primary language was to be Pastel, so the amount of assembly code was expected to be fairly minimal.) For various reasons, notably assembler runtime performance, it was eventually rewritten in C." - J. Bruner

Operating systems

Amber

The S-1 operating system was called Amber. The OS design was influenced by work at MIT, including Multics, ITS, and the MIT Lisp Machines, and by the Tenex operating system from BBN. The goal was a layered OS structure that could be tailored for support of real-time, time-sharing, and batch applications.

"The design of Amber was begun in 1979 by a team of six. Hon Wah Chin was project leader. Team members were Ted Anderson, Jeff Broughton, Charles Frankston, Lee Parks, and Daniel Weinreb. All team members were familiar with Multics. Lee Parks and Ted Anderson had participated in implementation of a small scale Multics like system as undergraduates [Parks1979]. Daniel Weinreb had worked on the MIT Lisp Machine project as a undergraduate [LispMachine].

"Most of the first year of the effort was spent in design and discussion. The essence of the current capability scheme was devised by Jeff Broughton toward the end of the design period. Prior to this time, the design was much closer to Multics in its concepts of segments, directories, access control and the like. In fact some code which was already written had to be modified to accommodate the new scheme, but the changes were not drastic.

"By that time it was apparent that the S-1 Mark IIA computer system, which Amber was designed for, was going to be ready later than expected. Had the hardware been closer to completion, it is likely that a less ambitious and more expeditious system would have been implemented. However, the continued non-availability of the target computer system was very detrimental to completion of coding efforts. It was increasingly difficult, both technically and psychologically, to continue building a kernel with no real feedback on how successful the elements of the structure thus far implemented were.

"At the end of the first year, Daniel Weinreb and Lee Parks left the project. Hon Wah Chin assumed other duties with the S-1 project and Jeff Broughton became team leader. One and a half years into the project Earl Killian joined in the midst of the switch to Pastel as the implementation language. Three years into the project Charles Frankston took an extended leave to continue his education. Jay Pattin joined the Amber team over four years after the inception of the project."

-- Charles Frankston (see http://www.mit.edu/~cbf/thesis.htm)

and

"The S-1 architecture supported a variable boundary between the segment number and the segment offset which was important for keeping the address to a mere 36 bits. The Mark II also had relative pointers (i.e. pointers that are an offset from the address of the word containing the offset), which was a nice feature for storing databases on disk that are mapped to different addresses in every process.

"There were some good OS ideas that were implemented, including a file system that did not require salvaging/repair when the machine was rebooted after a crash (a background process could recover lost blocks while the system was running and doing useful work). Unlike today's equivalents, it was not based on journaling, but rather careful ordering of operations. The filesystem supported a property-list for every file (a Lisp machine idea I think)."

-- Earl Killian

... combination of capability-based access with access-control lists ...

... processor affinity scheduling ...

The S-1 design made extensive use of diagnostic processors and techniques for fault tolerance. The OS supported dynamic reconfiguration ...

... more to do ...
[am I missing papers from Amber? what impact?]

Unix

There were two operating systems on the S-1 Mark IIA: Amber and Unix.

"The Unix port was based upon the 7th Edition, although the lack of virtual memory wasn't much of an issue because of the huge memory we had at the time (128 megabytes, where a byte was 9 bits wide). The C compiler was built using the (Johnson) Portable C compiler; we wrote (and later rewrote) the assembler and linker. A major challenge in porting Unix and C was the fairly loose distinction at that time between integers and pointers in C. The tagged architecture of the S-1 divided pointers into a 5-bit tag and a 31-bit address. Tags 0 and 31 were invalid, so as to trap in hardware spurious references through small positive and negative integers. However, in C it was common to use the all-zero bit pattern for (void *)NULL, which was an illegal pointer value. [Pastel did not share this problem because the nil pointer had a (nonzero) tag and there was no implicit association of 0 with nil.]" - J. Bruner

Impact

... tbd ...

The most visible impact of the S-1 project was on CAD tools. Two other major contributions were 2-bit dynamic branch prediction and the load linked / store conditional mutual exclusion primitives (which are now found in MIPS, Alpha, and PowerPC).

[inst. set - influenced development of RISC? - DEC through Baskett, MIPS through Hennessy and Killian, ...]
[did DEC PRISM epicode influence AAP magicode?]
[branch prediction - but most attribute two bits to Smith]
[decoded icache - ...]
[cache modes using tags in page table and TLB - used in MIPS, ...]
[MP, directory-based cache coherency - DASH, ...]
[compilers - gcc through Stallman, optimization through Chow, Lisp through ...; MIPS compilers, ...]
[OS - ...]

References

Interviews

Theses

haven't read yet

Web pages

Acknowledgements

Thanks to Jordin Kare for first pointing me to the S-1. Ted Anderson, Tina Darmohray, Earl Killian, and Curt Widdoes have been very helpful to me in correcting my understanding of the overall project, the processor designs, and the OS. My thanks also go to Maxine Trost, archivist at LLNL, for providing me with scanned S-1 articles and pictures. Thanks to Harry Quackenboss for help in correcting a typo in an earlier version.


S-1 Alumni

(separate page)


[History page] [Mark's homepage] [CPSC homepage] [Clemson Univ. homepage]

mark@cs.clemson.edu