Floating-Point Representation and IEEE-754
Contents:
A real-valued number is represented in a floating-point format as:
(-1) Sign × Significand × Base Exponent
where:
Sign is 0 for positive values, 1 for negative values.
Significand is a real number, composed as integer . fraction .
(Also known as "mantissa".)
Base is an integer value, presumably the numeric base.
Typically either 10 (for people) or 2 (for computers) (or 16 for some IBM computers).
Exponent is an integer value.
IEEE Standard 754-1985 specifies binary representations for floating point numbers:
As of Fall 2007, it is expected to be updated by IEEE-754r.
Some of the anticipated changes/additions are noted in the table below.
The Sign is one bit — 0 for positive values, 1 for negative values.
The Exponent is in an excess-N notation.
Exponents of 0 , or of maximal value (i.e. , all bits of the binary representation are 1 ), denote special values .
The Base is specified to be 2 . (IEEE-854 broadens this, accepting 10 as an alternate base.)
The Significand is normalized except for very small (subnormal ) values. In normalized form the integer portion (to the left of the radix point) is exactly one bit.
The normalized significand's range is thus (1.000… ) up to (1.111… ), in base-2.
A similar range for normalized base-10 numbers is (1.000… ) to (9.999… ). (Or in mathematical notation [1, 10) .)
The subnormal significands range from (0.111… ) down to (0.00…01 ), in base-2.
The leftmost (integer) bit is not stored in the standard formats, but instead is implicit (assumed to be 1 for normalized values, 0 for subnormal values).
The sizes of the components, in binary form, are as follows:
name, storage size
sign s
exponent E
significand
754R name
width
(bias), range
integer j
fraction f
(not in IEEE-754) 16 bits
1 bit:
0 → positive
1 → negative
5 bits
(15)
-14 ≤ E ≤ 15
(implicit)
normalized: 1
subnormal: 0
10 bits
binary16
"single precision" 32 bits
8 bits
(127)
-126 ≤ E ≤ 127
23 bits
binary32
"double precision" 64 bits
11 bits
(1023)
-1022 ≤ E ≤ 1023
52 bits
binary64
(SPARC "double-extended precision") 128 bits
15 bits
(16383)
-16382 ≤ E ≤ 16383
112 bits
binary128
"single extended precision"
( ≥ 43 bits )
≥11 bits
support -1022 ≤ E ≤ 1023 or more
≥32 bits, normalized
(replaced by binaryX ? )
"double extended precision"
( ≥ 79 bits )
≥15 bits
support -16382 ≤ E ≤ 16383 or more
≥64 bits, normalized
(replaced by binaryX ? )
(x86 "double-extended precision") 80 bits
15 bits
(16383)
1 bit (explicit)
63 bits
(replaced by binaryX ? )
(used in Itanium?) 82 bits
17 bits
(65535 ?)
(implicit as above)
64 bits
( ? )
name
value
meaning
s
E
f
Notes
– means "this bit isn't used"
000… means "all bits equal 0"
111… means "all bits equal 1"
bbb… is a string of bits, not all zero
Zero
0, 1
000…
.000…
Exactly zero (+0 and -0 are distinct, but equal)
Subnormal (denormalized)
0, 1
000…
.bbb…
Very small numbers — minimum exponent, with mantissa < 1.000… (the implicit integer signficand j is 0, not 1)
(positive) Infinity
0
111…
.000…
Any positive number whose magnitude exceeds the format limit
(negative) Infinity
1
111…
.000…
Any negative number whose magnitude exceeds the format limit
NaN (qNaN)
–
111…
.1 bbb…
Quiet "Not-a-Number" is produced by some operations with undefined outputs, e.g. 0/0.
NaN (sNaN)
–
111…
.0 bbb…
Signalling "Not-a-Number" represents a value that should generate a machine exception if it is used in an (arithmetic) operation.
32-bit standard examples, copied from PSC :
0 00000000 00000000000000000000000 = 0
1 00000000 00000000000000000000000 = -0
0 1 1 1 1 1 1 1 1 00000000000000000000000 = Infinity
1 1 1 1 1 1 1 1 1 00000000000000000000000 = -Infinity
0 1 1 1 1 1 1 1 1 000001 00000000000000000 = NaN (signaling)
1 1 1 1 1 1 1 1 1 001 0001 0001 001 01 01 01 01 0 = NaN (signaling)
0 1 1 1 1 1 1 1 1 1 0000000000000000000000 = NaN (quiet)
1 1 1 1 1 1 1 1 1 1 01 0001 1 01 01 01 000001 01 0 = NaN (quiet)
0 1 0000000 00000000000000000000000 = +1 * 2**(128-127) * 1.0 = 2
0 1 0000001 1 01 00000000000000000000 = +1 * 2**(129-127) * 1.101 = 6.5
1 1 0000001 1 01 00000000000000000000 = -1 * 2**(129-127) * 1.101 = -6.5
0 00000001 00000000000000000000000 = +1 * 2**(1-127) * 1.0 = 2**(-126)
0 00000000 1 0000000000000000000000 = +1 * 2**(-126) * 0.1 = 2**(-127)
0 00000000 00000000000000000000001 = +1 * 2**(-126) * 0.00000000000000000000001
= 2**(-149) (Smallest positive value)
See the reference
steve.hollasch.net/cgindex/coding/ieeefloat.html
for more information.
Changing IEEE-754:
The original form of the standard, "IEEE 754-1985", specifies base-2 representations.
An update, "IEEE 854-1987", specifies radix-independent representations.
As of Fall 2006, a working group has been considering revisions to IEEE 754 (and IEEE 854).
References for this working group are given below.
Conversion
web.bvu.edu/faculty/traylor/CS_Help_Stuff/Floating_point_representation.htm
Step-by-step procedure for converting a decimal number into IEEE-754 format (32-bit or 64-bit). By Dr. J Traylor, Buena Vista University.
Descriptions of Floating-Point, IEEE-754, IEEE-854, Revision 754r
www.validlab.com/754R/standards/754xml.html
The 1985 edition of the standard.
local copy: 754xml.html
standards.ieee.org/reading/ieee/std_public/description/busarch/854-1987_desc.html
"ANSI/IEEE Std 854-1987 IEEE Standard for Radix-Independent Floating-Point Arithmetic -Description" — table of contents and a link for purchasing.
IEEE 754R Decimal Floating-Point Arithmetic: Reliable and Efficient Implementation for Intel© Architecture Platforms
Discussion and description of Intel's approach to the decimal formats in the proposed IEEE-754r revision.
www.savrola.com/resources/IEEE854.html
"IEEE 854-1987 is the IEEE Standard for Radix-Independent Floating-Point Arithmetic.... " a description of IEEE-854 and how it compares to IEEE-754.
local copy: IEEE854.html
www.validlab.com/754R/drafts/754r.html
A draft of the revised standard.
local copy: draft-754r.html
standards.ieee.org/reading/ieee/interp/754-1985.html
IEEE-754's interpretation of division by zero.
local copy: standards-interp.754-1985.html
http://www.freesoft.org/CIE/RFC/1832/32.htm
"APPENDIX A: ANSI/IEEE Standard 754-1985" Tabular (monospaced) listing of data formats.
local copy: ANSI-IEEE_standard_754-1985.html
steve.hollasch.net/cgindex/coding/ieeefloat.html
"IEEE Standard 754 Floating Point Numbers", Steve Hollasch: a description of the standard. Includes descriptions of special values.
www.psc.edu/general/software/packages/ieee/ieee.html
From the Pittsburgh Supercomputing Center.
docs.sun.com/source/806-3568/ncg_math.html#719
Sun documentation of floating-point hardware and arithmetic (see Goldberg, below)
developer.intel.com/technology/itj/q41999/articles/art_6.htm
Intel Corporation: a detailed description of an implementation.
http://www2.hursley.ibm.com/decimal/IEEE-cowlishaw-arith16.pdf (needs a pdf-viewer)
Decimal Floating-Point: Algorism for Computers , Michael F. Cowlishaw.
A proposed implementation of IEEE-854.
en.wikipedia.org/wiki/IEEE_floating-point_standard
Wikipedia overview.
History and Background information
www.cs.berkeley.edu/~wkahan/ieee754status/754story.html
"An Interview with the Old Man of Floating-Point" --- some history from William Kahan
local copy: WKahan.754story.html
www.validlab.com/goldberg/addendum.html
David Goldberg: "What Every Computer Scientist Should Know About Floating-Point Arithmetic"
home.earthlink.net/~mrob/pub/math/floatformats.html
A survey of formats and implementations of floating-point numbers (includes references.)
www.netbsd.org/People/Pages/ross-essays.html#ieee-754
A contrarian view (very contrarian!)
"So I conclude that 754 is a virus, infecting individual programs, and making them unable to run on non-IEEE-754 hardware."
www2.hursley.ibm.com/decimal/854mins.html
Links to "ANSI/IEEE 854 — History and Minutes".
grouper.ieee.org/groups/754/meeting-minutes/01-11-15-old.html
A glimpse at how such standards are created --- a random set of 754r meeting minutes.
IEEE-754 Revision efforts
grouper.ieee.org/groups/754/revision.html
Official statement of purpose, meeting schedule and minutes.
en.wikipedia.org/wiki/IEEE_754r
Wikipedia entry (of course).
www.validlab.com/754R/
Balloting (from Spring 2007) and information about the draft proposal and the working group.
www.cs.berkeley.edu/~ejr/Projects/ieee754/
Some remarks about the revision effort by E. Jason Riedy (a participant?)
(Fall 2007 — page no longer available?)
www.cs.berkeley.edu/~ejr/Projects/ieee754/revision.html
Working group's statement of purpose, meeting schedule and minutes.
(Fall 2007 — page no longer available?)
Other Lists of References
cch.loria.fr/documentation/IEEE754/
Many references, mostly in English.
babbage.cs.qc.edu/courses/cs341/IEEE-754references.html
A page of references, including conversion demonstrations.
Homepage:
montcs.bloomu.edu/~bobmon/
File last modified
Monday, 22-Oct-2007 16:41:45 EDT
© 2004-2007 Robert Montante unless otherwise indicated. All rights reserved.