Last updated: March 2004
.. portions under construction ..
(from Bill Findlay) "Turing wrote a design study for the ACE in the late 1940s that envisaged a stack of return addresses, partly managed by software, and facilitated by the equivalent of a macro-assembler! Some of his ideas were later incorporated in the pilot ACE (at NPL) and the commercially-produced DEUCE. ... It is a remarkable document, discussing everything from the programming system to whether the delay lines should be filled with alcohol or mercury."
A.M. Turing, "Proposals for the development in the Mathematics Division of an Automatic Computing Engine (ACE)." Report E882, Executive Committee, NPL February 1946. Reprinted April 1972 as NPL Report Com. Sci 57.
(I believe this is either a draft of the paper or is the cited paper itself. It discusses performing subsidiary operations that are invoked by BURY instructions and returning control by UNBURY instructions, along with a stack (LIFO) store of instruction addresses in main memory and a stack pointer in temporary storage [i.e., "TS 31"].)
We also wish to be able to arrange for the splitting up of operations into subsidiary operations. This should be done in such a way that once we have written down how an operation is done we can use it as a subsidiary to any other operation. ... When we wish to start on a subsidiary operation we need only make a note of where we left off the major operation and then apply the first instruction of the subsidiary. When the subsidiary is over we look up the note and continue with the major operation. Each subsidiary operation can end with instructions for this recovery of the note. How is the burying and disinterring of the note to be done? There are of course many ways. One is to keep a list of these notes in one or more standard size delay lines (1024), with the most recent last. The position of the most recent of these will be kept in a fixed TS, and this reference will be modified every time a subsidiary is started or finished. The burying and disinterring processes are fairly elaborate, but there is fortunately no need to repeat the instructions involved, each time, the burying being done through a standard instruction table BURY, and the disinterring by the table UNBURY.[pp. 11-12]
BURY (General Description). The content of TS 1 with 1 added is transferred to the position indicated in TS 31, and 1 is added to the reference in TS 31. We then proceed to carry out the instruction in TS 1. UNBURY (General Description). The minor cycle whose position is given in TS 31 is taken to be position of the next instruction.[p. 30. Notes: TS 1 contains the address of the currently executing instruction. "minor cycle" = word.]
"The Pilot Model ACE ran its first program on May 10, 1950, at the National Physical Laboratory. With a clock speed of 1 MHz, Pilot Model ACE was for some time the fastest computer in the world. DEUCE, the production version of the Pilot Model, was constructed by the English Electric Company. In total more than 30 were sold. NPL's full-scale ACE began work in 1958. The fundamentals of Turing's ACE design were used in the Bendix G15 computer. The G15 was arguably the first personal computer and over 400 were sold worldwide. DEUCE and the G15 remained in use until about 1970. Another computer deriving from Turing's ACE design, the MOSAIC, played a role in Britain's air defences during the Cold War period; other derivatives include the Packard-Bell PB250 (1961)."
(my thanks to Bill Findlay for telling me about the ACE)
location contents (ignoring length code) m : A m - add the number in memory location m to the accumulator (i.e., it loads itself) m + 1 : G n - branch to memory location n (the start of the subroutine m + 2 : ... - control will return here ... n : A 3 - add the number in memory location 3 to the accumulator (it is a predetermined constant that will change "A m" to "G m+2") n + 1 : T p - store new return instruction at bottom of this subroutine ... p : ---- - location for return branchadapted from EDSAC Tutorial Guide, p. 24 (pdf)
"Originally, to simplify the rather complex programming process, subroutines were simply stringed together on the tape. The major advance came when Wheeler thought up the idea of the sub-routine. This meant that, rather than simply executing subroutine after subroutine, a program could execute a subroutine, and after the subroutine had run, the program would again gain control where it left off. This was called the 'Wheeler Jump', and is an optimal programming technique which forms the basis of the way programs are written today." [http://hoc.co.umist.ac.uk/storylines/compdev/electronic/edsac.html]
(thanks to Bill Findlay)
Depending on language and architecture, registers may hold: - uplevel addressing link - exception handler pointer (for Ada) - frame pointer - program counter - stack pointer - actual parameters - return values - constant pool pointer - spare space for called routines to trash - registers reserved for link-time-generated code to use(list from Don Lindsay usenet post)
Early subroutine linkages stored the return address in memory, e.g., as the first word of subroutine.
Other systems with memory-based linkage include the EDSAC (using the Wheeler jump), (check Ferranti Mark 1 / Atlas used extracodes for subroutine calls), Univac I, Bull Gamma 60, IBM Stretch, CDC 160 and 6600 PPU, and Univac 1103A.
The IBM 704 appears to be the first computer to store the return address in a register. This approach, popular for many years because of the in-line passing of parameters, fell out of favor when memory stack linkages became popular in the 1970s. It then regained favor when RISC architects wanted to avoid traffic to a memory stack on each call and return.
The Cray-1 also used a register-based linkage.
... store return address in a memory stack ...
Typical procedure support in 1970s dedicated registers sp - stack pointer, top of stack fp - frame pointer, anchors the stack frame for this subroutine invocation (will not change as things are pushed onto the stack) memory stack frame (subroutine call activation record) | / / / / / /| +------------+ sp -->| ... | | locals | fp-c is usually an address of a local variable | ... | +------------+ fp -->| old fp | | saved regs | | rtn addr | +------------+ | ... | | parameters | fp+k is usually an address of a parameter | ... | +------------+ | / / / / / /| calling program push parameter2 on stack ! typically done in reverse order push parameter1 on stack call subroutine ! pushes return address onto the stack clean parameters off stack ! add constant to stack pointer after ! return subroutine (callee save convention to save registers) /* prologue */ push registers to save push old fp set new fp as current sp subtract from sp to make room for locals /* body */ ... body of subroutine, in which you access parameters with memory addresses fp+k and local vars with memory addresses fp-c ... /* epilogue */ set sp to current fp pop fp pop registers to restore return ! pops return address from stack
S.C. Johnson and D.M. Ritchie, "The C Language Calling Sequence," Bell Labs, CS Tech. Rept. No. 102, September, 1981.
Example linkages - stack machines
Other stack machines include the ICL 2900 series and the HP-3000 (pcal and scal).
Example linkages - general-register machines
See also the Turing ACE.
... desire to reduce memory traffic ...
SPARC designers wanted to eliminate as many memory accesses as possible memory-stack-based call requires at least four memory accesses 1. the return address (pc) is pushed by the call instruction 2. the old fp is pushed at the top of the subroutine 3. the old fp is popped at the bottom of the subroutine, and 4. the return address is popped by the return instruction then there are two memory accesses for each register saved (push to save and then pop to restore). the SPARC designers provided overlapped register windows instead
... design choices:
fixed size vs. variable size
overlapping vs. non-overlapping
Some folks see the idea of register windows in the scratchpad registers of the Fairchild F8 (you changed which of its 64 8-bit registers you could easily access by changing the contents of a 6-bit indirect scratchpad address register) and in the memory-mapped registers of the TI TMS-9900 (you changed the memory mapping of its 16 general registers by changing the contents of the workspace pointer register).
... also Pyramid 90x and Celerity C12000 ...
John Mashey's comments on register windows (Usenet posting to comp.arch)
TO DO ... desire to provide different protection domains ...
Intel x86 call gates
TO DO ... desire to provide different protection domains ...
M68020 callm and retm
Intel 432 call and call_thru_domain
Plessey 250, S/38 (AS/400), ...
Hank Levy's Capability-Based Computer Systems book
My thanks go to several folks who helped me understand the subroutine linkages on the described machines: John Ahlstrom, Bill Findlay, Jim Hull, and Doug Jones. (My apologies to these individuals for any misunderstandings on my part about the information they have shared with me; the errors in the descriptions are entirely mine.)
[History page] [Mark's homepage] [CPSC homepage] [Clemson Univ. homepage]