History of Multithreading
Last major update: April 2005
(Minor update: November 2007)
Summary: Multithreading first appeared in the 1950s, and
simultaneous multithreading (SMT) was investigated by IBM in
Most attempts at a history of multithreaded processors start with
the PPUs in the CDC 6600. However, the ideas of sharing a single
execution path among multiple threads of execution started at least
a decade earlier.
First, there are a number of types of multithreading.
- "multiple sequence"
- multiple program counters
- some degree of program-controlled switching (coroutines)
- a.k.a. explicit-switch, static, blocked multithreading in Ungerer's
- e.g., break bits in instruction format in DYSEAC (1954) and TX-2
- e.g., Transputer (1985)
- "fine-grain MT (FGMT)"
- switch threads on each cycle
- a.k.a. "vertical multithreading"
- a.k.a. "interleaved multithreading (IMT)"
- a.k.a. "time-sharing multithreading"
- a.k.a. "hardware multiprogramming" in H800 (1958)
[see also Singer System 10 - 1970 FJCC]
- e.g., CDC 6600 PPUs
- e.g., HEP
- e.g., Tera MTA - 128 threads
- in-order core
- goal is to cover memory latency and improve overall throughput
- "coarse-grain MT (CGMT)"
- a.k.a. "concurrent multithreading (CMT)"
- a.k.a. "switch-on-event multithreading (SOEMT)"
- a.k.a. "dynamic blocked multithreading (BMT)"
- e.g., IBM AS400 PPC Pulsar
- two threads, 3-cycle thread switch, priority scheme
for scheduling, also can set to switch on L2 miss
- R. Eickemeyer, R. Johnson, S. Kunkel, B.-H. Lim,
M. Squillante, and C. Wu, "Evaluation of Multithreaded
Processors and Thread Switch Policies", 1997 International
Symposium on High Performance Computing, Springer LNCS 1336,
Nov. 1997, Fukuoka, Japan, pp. 75-90.
- J.M. Borkenhagen, R.J. Eickemeyer, R.N. Kalla, and S.R. Kunkel,
"A multithreaded PowerPC processor for commercial servers,"
IBM J. Res. Dev., vol. 44, no. 6, 2000, pp. 885-898.
- goal is to cover latencies and improve overall throughput
- "simultaneous MT (SMT)"
- a.k.a. "horizontal multithreading"
- e.g., U. Wash. work, DEC 21464 (EV8) plans for 4-way SMT
(Joel Emer of Compaq apparently claimed that EV8 would achieve
an IPC speed up of 2x at a cost of 10%. --check for quote)
- e.g., Intel P4 "hyper-threading"
- multiple instructions per cycle, out-of-order core
- goal is to increase IPC
- implicit MT (IMT)
- compiler-based speculative threads with hardware support
beyond regular SMP
- e.g., Manoj Franklin, Multiscalar
- e.g., T.N. Vijaykumar
- dynamic MT (DMT)
- hardware-based speculative threads
- e.g., Akkary & Driscoll, "A Dynamic Multithreading Processor"
- forking threads at call sites and after loops
- 31st Annual ACM/IEEE International Symposium on Microarchitecture,
- Antonio Gonzales, Speculative Multithreading (SpMT),
forking threads for loop bodies with data value prediction
- S. Wallace, B. Calder, and D. M. Tullsen, "Threaded multiple
- H. Wang, P. Wang, R.D. Weldon, S.M. Ettinger, H. Saito, M. Girkar,
S.S-W. Liao, amnd J. Shen, "Speculative Precomputation: Exploring
the Use of Multithreading for Latency." Intel Technology Journal.
- US 6,493,820: Processor having multiple program counters and trace
buffers outside an execution pipeline.
- US 6,463,522: Memory system for ordering load and store instructions in a
processor that performs multithread execution.
- US 6,260,150: Foreground and background context controller
setting processor to power saving mode when all contexts are inactive
- US 6,243,736: Context controller having status-based background
functional task resource allocation capability and processor
employing the same
- US 6,205,468: System for multitasking management employing context
controller having event vector selection by priority encoding of
A partial timeline of multithreaded systems
- CDC 6600 (1964) - multithreading in peripheral processors
- IBM ACS-360 (1968) - successor design to
IBM ACS-1 was planned to be 2-way multithreaded
A second instruction counter and a second set of registers
were added to the simulator to make the ACS-360 into a
multithreaded design. Instructions were tagged with an
additional "red/blue" bit to designate the instruction
stream and register set; and, as project members had expected,
the utilization of the functional units was increased since
more independent instructions were available.
(apparently this same scheme was later investigated for use
on the S/370 model 195)
- Mike Flynn papers on multithreading (check Spring AFIPS, 1967;
ONR 1969; IFIP 1971; IEEE Computer 1972; AFIPS 1972)
[see also: M. J. Flynn and A. Podvin, "Shared Resource Multiprocessing,"
IEEE Computer, Mar. 1972, pp. 20-28.]
- HEP (1978) - Burton Smith
- Xerox Alto (1979) - microtasking
- HEP-2 and HEP-3 designs (a.k.a. Vulture)
- Denelcor closes in 1985
- Transputer (1985)
- Horizon (1988) - Burton Smith
- Stellar GS-1000 (1988) - four-way multithreading
- ... others ... MASA ...
- Tera MTA () - Burton Smith (impl. in GaAs)
- SMT work at Univ. of Washington
- Dean Tullsen, Susan Eggers, and Henry Levy, "Simultaneous
Multithreading: Maximizing On-Chip Parallelism," ISCA 22,
- Dean Tullsen, Susan Eggers, Joel Emer, Henry Levy, Jack Lo,
and Rebecca L. Stamm, "Exploiting Choice: Instruction Fetch
and Issue on an Implementable Simultaneous Multithreading
Processor," ISCA 23, May, 1996.
interview with Joel Emer:
"The challenge in processor design has been getting better
single-threaded performance. I hadn't seriously considered
multithreading since I finished my Ph.D. on that subject in
the late '70s. Then I saw this research from the University
of Washington [Dean Tullsen, Susan Eggers, Henry Levy, 1995]
where they proposed doing simultaneous multithreading. I was
really inspired by this idea of doing things simultaneously.
I thought, 'Well that's compatible with the out-of-order
design style that we're moving towards. And that gives us
the opportunity to develop the best single-threaded machine
we can, and do multithreading as well.'
We had this persistent problem of low utilization of individual
components. That meant that there was a lot of opportunity to
get more performance out of the machine. Their idea was 'Hey,
just fill those opportunities with instructions from another
program.' I was working with Rebecca Stamm at Digital Equipment
Corporation and we worked through a way of incorporating this
SMT idea into an out-of-order design. We began to collaborate
with the University of Washington and studying the application
of SMT in an out-of-order context.
... So in the first go around, Dean [Tullsen] wrote a simulator
and came back and said, 'I ran the out-of-order SMT, and I got
performance that's worse than if I didn't run multiple threads.'
Then he noticed that he was having resource clogging problems.
One of the threads would get a huge number of instructions in
some part of the machine, and it would just bottleneck that
part. None of the other threads could get any work done.
We had a conference call and out of that we realized that
resource allocation, or choosing what to do, was the key to
getting it to work. We proposed some schemes and then explored
'How do we choose which instructions should be executing now?'
The result was a second paper on SMT that Rebecca and I coauthored
with the researchers from the University of Washington. And that's
the seminal paper that explains how important it is to choose which
instruction stream to fetch instructions from at any point in time."
- Michael Fischer - combination of foreground (preemptive) and background
(fixed time slices)
- ... others ... April/Sparcle/Alewife [IEEE Micro, June 1993], *T, ...
Cal Tech Mosaic T [myricom] ...
- Cray/Tera MTA-2 (CMOS)
- Intel Pentium 4 HT
- Greg Byrd and Mark Holliday, "Multithreaded processor
architectures," IEEE Spectrum, 32(8):38-46, August 1995.
- Theo Ungerer, Borut Robic, Jurij Silc,
"A survey of processors with explicit multithreading,"
ACM Computing Surveys, Volume 35, Issue 1, March 2003.
The Computer Journal, Volume 45, Issue 3, 2002, pp. 320-348.
- Iannucci, Gao, Halstead, Smith,
Multithreaded Computer Architecture: A Summary of the State of the Art,
Kluwer Academic Publishers, 1994
- Jon Stokes,
"Introduction to Multithreading, Superthreading and Hyperthreading,"
Ars Technica, October 2002.
[Clemson Univ. homepage]