Floating-Point Representation and IEEE-754
Contents:
A real-valued number is represented in a floating-point format as:
(-1) Sign × Significand × Base Exponent
where:
Sign is 0 for positive values, 1 for negative values.
Significand is a real number, composed as integer . fraction .
(Also known as "mantissa".)
Base is an integer value, presumably the numeric base.
Typically either 10 (for people) or 2 (for computers) (or 16 for some IBM computers).
Exponent is an integer value.
This page was originally created before the 2008 revision of 754 was finalized.
The revision is referred to as "754R" below. Some minor details, such as format names, may have changed.
This page will be updated as soon as time permits.
IEEE Standard 754-1985 specifies binary representations for floating point numbers:
As of Fall 2007, it is expected to be updated by IEEE-754r.
Some of the anticipated changes/additions are noted in the table below.
The Sign is one bit — 0 for positive values, 1 for negative values.
The Exponent is in an excess-N notation.
Exponents of 0 , or of maximal value (i.e. , all bits of the binary representation are 1 ), denote special values .
The Base is specified to be 2 . (IEEE-854 broadens this, accepting 10 as an alternate base.)
The Significand is normalized except for very small (subnormal ) values. In normalized form the integer portion (to the left of the radix point) is exactly one bit.
The normalized significand's range is thus (1.000… ) up to (1.111… ), in base-2.
A similar range for normalized base-10 numbers is (1.000… ) to (9.999… ). (Or in mathematical notation [1, 10) .)
The subnormal significands range from (0.111… ) down to (0.00…01 ), in base-2.
The leftmost (integer) bit is not stored in the standard formats, but instead is implicit (assumed to be 1 for normalized values, 0 for subnormal values).
The sizes of the components, in binary form, are as follows:
name (format, storage size)
sign s
exponent E
significand
width
(bias), range
integer j
fraction f
binary16 (not in IEEE-754)
1 bit:
0 → positive
1 → negative
5 bits
(15)
-14 ≤ E ≤ 15
(implicit)
normalized: 1
subnormal: 0
10 bits
binary32 "single precision"
8 bits
(127)
-126 ≤ E ≤ 127
23 bits
binary64 "double precision"
11 bits
(1023)
-1022 ≤ E ≤ 1023
52 bits
binary128
(SPARC "double-extended precision")
15 bits
(16383)
-16382 ≤ E ≤ 16383
112 bits
name (format, storage size)
sign s
exponent E
significand
width
(bias), range
integer j
fraction f
decimal32
8 bits
-95 ≤ E ≤ 96
23 bits
decimal64
11 bits
(1023)
-1022 ≤ E ≤ 1023
52 bits
decimal128
15 bits
(16383)
-16382 ≤ E ≤ 16383
112 bits
"single extended precision"
( ≥ 43 bits )
≥11 bits
support -1022 ≤ E ≤ 1023 or more
≥32 bits, normalized
(replaced by binaryX ? )
"double extended precision"
( ≥ 79 bits )
≥15 bits
support -16382 ≤ E ≤ 16383 or more
≥64 bits, normalized
(replaced by binaryX ? )
(x86 "double-extended precision") 80 bits
15 bits
(16383)
1 bit (explicit)
63 bits
(replaced by binaryX ? )
(used in Itanium?) 82 bits
17 bits
(65535 ?)
(implicit as above)
64 bits
( ? )
Bit patterns whose exponent field is all zeros or all ones have special values or meanings:
name
value
meaning
s
E
f
Notes
– means "this bit isn't meaningful"
000… means "all bits equal 0"
111… means "all bits equal 1"
bbb… means "arbitrary bitstring, not all 0s
Zero
0,1
000…
.000…
Exactly zero (+0 and -0 are distinct, but equal)
Subnormal (denormalized)
0,1
000…
.bbb…
Very small numbers — minimum exponent, with mantissa < 1.000… (the implicit integer signficand j is 0, not 1)
(positive) Infinity
0
111…
.000…
Any positive number whose magnitude exceeds the format limit
(negative) Infinity
1
111…
.000…
Any negative number whose magnitude exceeds the format limit
NaN (qNaN)
–
111…
.1 bbb…
Quiet "Not-a-Number" is produced by some operations with undefined outputs, e.g. 0/0.
NaN (sNaN)
–
111…
.0 bbb…
Signalling "Not-a-Number" represents a value that should generate a machine exception if it is used in an (arithmetic) operation.
32-bit standard examples, copied from PSC :
0 00000000 00000000000000000000000 = 0
1 00000000 00000000000000000000000 = -0
0 1 1 1 1 1 1 1 1 00000000000000000000000 = Infinity
1 1 1 1 1 1 1 1 1 00000000000000000000000 = -Infinity
0 1 1 1 1 1 1 1 1 000001 00000000000000000 = NaN (signaling)
1 1 1 1 1 1 1 1 1 001 0001 0001 001 01 01 01 01 0 = NaN (signaling)
0 1 1 1 1 1 1 1 1 1 0000000000000000000000 = NaN (quiet)
1 1 1 1 1 1 1 1 1 1 01 0001 1 01 01 01 000001 01 0 = NaN (quiet)
0 1 0000000 00000000000000000000000 = +1 * 2**(128-127) * 1.0 = 2
0 1 0000001 1 01 00000000000000000000 = +1 * 2**(129-127) * 1.101 = 6.5
1 1 0000001 1 01 00000000000000000000 = -1 * 2**(129-127) * 1.101 = -6.5
0 00000001 00000000000000000000000 = +1 * 2**(1-127) * 1.0 = 2**(-126)
0 00000000 1 0000000000000000000000 = +1 * 2**(-126) * 0.1 = 2**(-127)
0 00000000 00000000000000000000001 = +1 * 2**(-126) * 0.00000000000000000000001
= 2**(-149) (Smallest positive value)
See the reference
steve.hollasch.net/cgindex/coding/ieeefloat.html
for more information.
Changing IEEE-754:
The original form of the standard, "IEEE 754-1985", specifies base-2 representations.
An update, "IEEE 854-1987", specifies radix-independent representations.
As of Fall 2006, a working group has been considering revisions to IEEE 754 (and IEEE 854).
References for this working group are given below.
Implementation
grouper.ieee.org/groups/754/reading.html#software
Reading list from IEEE on (software) implementations of the standard.
Excerpt:
Linux provides software assistance and GNU libc provides support routines.
The Linux code is also a good example, but the GNU libc code is an exercise in obfuscation.
Conversion
web.bvu.edu/faculty/traylor/CS_Help_Stuff/Floating_point_representation.htm
Step-by-step procedure for converting a decimal number into IEEE-754 format (32-bit or 64-bit). By Dr. J Traylor, Buena Vista University.
Descriptions of Floating-Point, IEEE-754, IEEE-854, Revision 754r
www.validlab.com/754R/standards/754xml.html
The 1985 edition of the standard.
local copy: 754xml.html
standards.ieee.org/reading/ieee/std_public/description/busarch/854-1987_desc.html
"ANSI/IEEE Std 854-1987 IEEE Standard for Radix-Independent Floating-Point Arithmetic -Description" — table of contents and a link for purchasing.
IEEE 754R Decimal Floating-Point Arithmetic: Reliable and Efficient Implementation for Intel© Architecture Platforms
Discussion and description of Intel's approach to the decimal formats in the proposed IEEE-754r revision.
www.savrola.com/resources/IEEE854.html
"IEEE 854-1987 is the IEEE Standard for Radix-Independent Floating-Point Arithmetic.... " a description of IEEE-854 and how it compares to IEEE-754.
local copy: IEEE854.html
www.validlab.com/754R/drafts/754r.html
A draft of the revised standard.
local copy: draft-754r.html
standards.ieee.org/reading/ieee/interp/754-1985.html
IEEE-754's interpretation of division by zero.
local copy: standards-interp.754-1985.html
http://www.freesoft.org/CIE/RFC/1832/32.htm
"APPENDIX A: ANSI/IEEE Standard 754-1985" Tabular (monospaced) listing of data formats.
local copy: ANSI-IEEE_standard_754-1985.html
steve.hollasch.net/cgindex/coding/ieeefloat.html
"IEEE Standard 754 Floating Point Numbers", Steve Hollasch: a description of the standard. Includes descriptions of special values.
www.psc.edu/general/software/packages/ieee/ieee.html
From the Pittsburgh Supercomputing Center.
docs.sun.com/source/806-3568/ncg_math.html#719
Sun documentation of floating-point hardware and arithmetic (see Goldberg, below)
developer.intel.com/technology/itj/q41999/articles/art_6.htm
Intel Corporation: a detailed description of an implementation.
http://www2.hursley.ibm.com/decimal/IEEE-cowlishaw-arith16.pdf (needs a pdf-viewer)
Decimal Floating-Point: Algorism for Computers , Michael F. Cowlishaw.
A proposed implementation of IEEE-854.
en.wikipedia.org/wiki/IEEE_floating-point_standard
Wikipedia overview.
History and Background information
www.cs.berkeley.edu/~wkahan/ieee754status/754story.html
"An Interview with the Old Man of Floating-Point" --- some history from William Kahan
local copy: WKahan.754story.html
www.validlab.com/goldberg/addendum.html
David Goldberg: "What Every Computer Scientist Should Know About Floating-Point Arithmetic"
home.earthlink.net/~mrob/pub/math/floatformats.html
A survey of formats and implementations of floating-point numbers (includes references.)
www.netbsd.org/People/Pages/ross-essays.html#ieee-754
A contrarian view (very contrarian!)
"So I conclude that 754 is a virus, infecting individual programs, and making them unable to run on non-IEEE-754 hardware."
www2.hursley.ibm.com/decimal/854mins.html
Links to "ANSI/IEEE 854 — History and Minutes".
grouper.ieee.org/groups/754/meeting-minutes/01-11-15-old.html
A glimpse at how such standards are created --- a random set of 754r meeting minutes.
IEEE-754 Revision efforts
grouper.ieee.org/groups/754/revision.html
Official statement of purpose, meeting schedule and minutes.
en.wikipedia.org/wiki/IEEE_754r
Wikipedia entry (of course).
www.validlab.com/754R/
Balloting (from Spring 2007) and information about the draft proposal and the working group.
www.cs.berkeley.edu/~ejr/Projects/ieee754/
Some remarks about the revision effort by E. Jason Riedy (a participant?)
(Fall 2007 — page no longer available?)
www.cs.berkeley.edu/~ejr/Projects/ieee754/revision.html
Working group's statement of purpose, meeting schedule and minutes.
(Fall 2007 — page no longer available?)
Other Lists of References
cch.loria.fr/documentation/IEEE754/
Many references, mostly in English.
babbage.cs.qc.edu/courses/cs341/IEEE-754references.html
A page of references, including conversion demonstrations.
Homepage:
montcs.bloomu.edu/
© 2004-2009 Robert Montante unless otherwise indicated. All rights reserved.
File last modified