Prediction and Profile Buffers

Mark Smotherman
Last updated: May 2004

UNDER CONSTRUCTION

Summary: ... tbd ...


... intro to be written ...

... explain purpose ... importance of dynamic branch prediction ... but beyond that high perf. impls. will record and exploit other dynamic behavior of programs ... look at tables and caches that assist inst. fetch and decoding ...

... this survey not oriented to covering all the specific BP algorithms ... (so find cite to paper or web page that is)

... note difference between table structures (mere lookup) versus caches (hit/miss via tagging) ... offer rationale ...


Design Tree

An extensive design tree was posted by Andy Glew in comp.arch in July 1998 and reprinted in CAN Internet Nuggets, Vol. 26, No. 5, December 1998, p. 34.

From: "Andy Glew" 
Newsgroups: comp.arch
Subject: Re: Branch Target Cache (&& AMD 29K)
Date: 15 Jul 1998 23:02:42 GMT
Organization: Intel & University of Wisconsin


At the risk of reducing the number of future patent plaques on my wall,
let me enumerate some possible combinations of the above that I
consider "obvious to someone skilled in the art".


Index structure by:
---------------------------

instruction address
branch target address
branch from address
path (sequence of branch from/to addresses)
branch history (taken/not-taken)
procedure call address
procedure call point
branch address (or procedure address) in combination with certain parameter values
basic block number (i.e. transformed branch target address)
data address
address of data referencing instruction (e.g. load at address A accesses B)
memory addressing mode 
memory addressing mode and parameters
register version numbers or timestamps
any hash or checksum of the above
        e.g bitfield extracts, shifts, XORs, etc.
linear offsets of any of the above
        denominated in a variety of units such as time, etc.
        e.g. branch from address - 2 instructions
transitive closures of the above
        denominated in a variety of units such as time, same unit, etc.
        e.g. the branch from address 2 branches before the instruction


What is cached
-----------------------

raw instructions
data
branch target address
instructions at branch target
decoded instructions
sequence of instructions at branch target
        possibly beyond multiple branches, e.g. a trace
optimized sequence of instructions at branch target
        where optimizations may include
        a) pre-renaming (relative to trace)
        b) rearranging instructions for schedule
        c) eliminating dead code
        d) common sub expressions detected by hardware
        e) predication to eliminate some branches
type information (e.g. this value is an integer; this value is a pointer)
skip list information
future branch history
future branch path
past branch history
past branch path
schedules indicating expected order of fetch or execution of instructions
cache miss addresses
        absolute, or relative to given data values
compressed versions of the above
combinations of the above - e.g. return both a trace of instructions,
        and a expected cache miss addresses 


Branch Prediction Terminology

IBM mainframe systems and patents use the terms

So here are my terms.

     branch   branch   target   target
     address  history  address  insts.
     -------  -------  -------  -------
BHT             yes                      branch history table
BPC   tags      yes      yes      yes    branch prediction cache
BTAC  tags               yes             branch target address cache
BTB   tags      yes      yes             branch target buffer
BTIC  tags                        yes    branch target instruction cache

(My use of BTB and BPC is more a concession to historical usage rather than seeing them as completely descriptive terms.)

For another discussion of terminology, see sections 8.4.4 and 8.4.5 of D. Sima, T. Fountain, and P. Kacuk, Advanced Computer Architecture: A Design Space Approach, Addison-Wesley, 1997 and chapter 4 of A. Omondi, The Microarchitecture of Pipelined and Superscalar Computers, Kluwer, 1999. Note also that branch instruction identification can be accelerated without using a prediction buffer/cache by partial decoding of instruction fetch buffers ("look-ahead branch detection" section 8.4.2 of Sima, et al.).


Actual Designs and Implementations

... to be done / very incomplete ...
... many descriptions of current processors ...
... aim this list at identifying earliest examples in each category ...

(US Patents - for branching see subclasses 238 and 240 under class 712)


[History page] [Mark's homepage] [CPSC homepage] [Clemson Univ. homepage]

mark@cs.clemson.edu