# Floating-Point Representation and IEEE-754

## In general:

A real-valued number is represented in a floating-point format as:

(-1)Sign × Significand × BaseExponent

where:

• Sign is 0 for positive values, 1 for negative values.
• Significand is a real number, composed as integer.fraction. (Also known as "mantissa".)
• Base is an integer value, presumably the numeric base. Typically either 10 (for people) or 2 (for computers) (or 16 for some IBM computers).
• Exponent is an integer value.

## Standard IEEE-754:

(-1)Sign × Significand × BaseExponent
• The Sign is one bit — 0 for positive values, 1 for negative values.
• The Exponent is in an excess-N notation.
• Exponents of 0, or of maximal value (i.e., all bits of the binary representation are 1), denote special values.
• The Base is specified to be 2. (decimal formats use 10 as the base.)
• The Significand is normalized except for very small (subnormal) values. In normalized form the integer portion (to the left of the radix point) is exactly one bit, and is implicit (i.e. not stored).
• The leftmost (integer) bit is not stored in the standard formats, but instead is implicit (assumed to be 1 for normalized values, 0 for subnormal values).
• The normalized significand's range is thus (1.000…) up to (1.111…), in base-2.
• A similar range for normalized base-10 numbers is (1.000…) to (9.999…). (Or in mathematical notation [1, 10).)
• The subnormal significands range from (0.111…) down to (0.00…01), in base-2.

### Data formats:

The sizes of the components, in binary form, are as follows:

name
(format, storage size)
sign s exponent E significand
width (bias), range integer j fraction f
binary16
(storage-only format)
1 bit:
0 → positive
1 → negative
5 bits (15)
-14 ≤ E ≤ 15
(implicit)

normalized: 1
subnormal: 0
10 bits
binary32
"single precision"
8 bits (127)
-126 ≤ E ≤ 127
23 bits
binary64
"double precision"
11 bits (1023)
-1022 ≤ E ≤ 1023
52 bits
binary128
(SPARC "double-extended precision")
15 bits (16383)
-16382 ≤ E ≤ 16383
112 bits
name
(format, storage size)
sign s exponent E significand
width (bias), range integer j fraction f
decimal32 8 bits -95 ≤ E ≤ 96 23 bits
decimal64 11 bits (1023)
-1022 ≤ E ≤ 1023
52 bits
decimal128 15 bits (16383)
-16382 ≤ E ≤ 16383
112 bits
"single extended precision"
( ≥ 43 bits )
≥11 bits support -1022 ≤ E ≤ 1023 or more ≥32 bits, normalized (replaced by binaryX ? )
"double extended precision"
( ≥ 79 bits )
≥15 bits support -16382 ≤ E ≤ 16383 or more ≥64 bits, normalized (replaced by binaryX ? )
(x86 "double-extended precision")
80 bits
15 bits (16383) 1 bit (explicit) 63 bits (replaced by binaryX ? )
(used in Itanium?)
82 bits
17 bits (65535 ?) (implicit as above) 64 bits ( ? )

### Special Values:

Bit patterns whose exponent field is all zeros or all ones have special values or meanings:

name value meaning
s E f
Notes means "this bit isn't meaningful"
000… means "all bits equal 0"
111… means "all bits equal 1"
bbb… means "arbitrary bitstring, not all 0s
Zero 0,1 000… .000… Exactly zero (`+0` and `-0` are distinct, but equal)
Subnormal (denormalized) 0,1 000… .bbb… Very small numbers — minimum exponent, with mantissa < `1.000…` (the implicit integer signficand j is 0, not 1)
(positive) Infinity 0 111… .000… Any positive number whose magnitude exceeds the format limit
(negative) Infinity 1 111… .000… Any negative number whose magnitude exceeds the format limit
NaN (qNaN) 111… .1bbb… Quiet "Not-a-Number" is produced by some operations with undefined outputs, e.g. `0/0`.
NaN (sNaN) 111… .0bbb… Signalling "Not-a-Number" represents a value that should generate a machine exception if it is used in an (arithmetic) operation.

32-bit standard examples, copied from PSC :

0 00000000 00000000000000000000000 = 0
1 00000000 00000000000000000000000 = -0

0 11111111 00000000000000000000000 = Infinity
1 11111111 00000000000000000000000 = -Infinity

0 11111111 00000100000000000000000 = NaN (signaling)
1 11111111 00100010001001010101010 = NaN (signaling)
0 11111111 10000000000000000000000 = NaN (quiet)
1 11111111 10100011010101000001010 = NaN (quiet)

0 10000000 00000000000000000000000 = +1 * 2**(128-127) * 1.0 = 2
0 10000001 10100000000000000000000 = +1 * 2**(129-127) * 1.101 = 6.5
1 10000001 10100000000000000000000 = -1 * 2**(129-127) * 1.101 = -6.5

0 00000001 00000000000000000000000 = +1 * 2**(1-127) * 1.0 = 2**(-126)
0 00000000 10000000000000000000000 = +1 * 2**(-126) * 0.1 = 2**(-127)

0 00000000 00000000000000000000001 = +1 * 2**(-126) * 0.00000000000000000000001
= 2**(-149) (Smallest positive value)

## Bibliography/References:

### Changing IEEE-754:

The original form of the standard, "IEEE 754-1985", specifies base-2 representations. An update, "IEEE 854-1987", specifies radix-independent representations. As of Fall 2006, a working group has been considering revisions to IEEE 754 (and IEEE 854). References for this working group are given below.

### Implementation

Reading list from IEEE on (software) implementations of the standard.
Excerpt:
Linux provides software assistance and GNU libc provides support routines. The Linux code is also a good example, but the GNU libc code is an exercise in obfuscation.

### Conversion

web.bvu.edu/faculty/traylor/CS_Help_Stuff/Floating_point_representation.htm
Step-by-step procedure for converting a decimal number into IEEE-754 format (32-bit or 64-bit). By Dr. J Traylor, Buena Vista University.

### Descriptions of Floating-Point, IEEE-754, IEEE-854, Revision 754r

www.validlab.com/754R/standards/754xml.html
The 1985 edition of the standard. local copy: 754xml.html
IEEE 754R Decimal Floating-Point Arithmetic: Reliable and Efficient Implementation for Intel© Architecture Platforms
Discussion and description of Intel's approach to the decimal formats in the proposed IEEE-754r revision.
www.savrola.com/resources/IEEE854.html
"IEEE 854-1987 is the IEEE Standard for Radix-Independent Floating-Point Arithmetic...." a description of IEEE-854 and how it compares to IEEE-754. local copy: IEEE854.html
www.validlab.com/754R/drafts/754r.html
A draft of the revised standard. local copy: draft-754r.html
IEEE-754's interpretation of division by zero. local copy: standards-interp.754-1985.html
http://www.freesoft.org/CIE/RFC/1832/32.htm
"APPENDIX A: ANSI/IEEE Standard 754-1985" Tabular (monospaced) listing of data formats. local copy: ANSI-IEEE_standard_754-1985.html
steve.hollasch.net/cgindex/coding/ieeefloat.html
"IEEE Standard 754 Floating Point Numbers", Steve Hollasch: a description of the standard. Includes descriptions of special values.
www.psc.edu/general/software/packages/ieee/ieee.html
From the Pittsburgh Supercomputing Center.
docs.sun.com/source/806-3568/ncg_math.html#719
Sun documentation of floating-point hardware and arithmetic (see Goldberg, below)
developer.intel.com/technology/itj/q41999/articles/art_6.htm
Intel Corporation: a detailed description of an implementation.
http://www2.hursley.ibm.com/decimal/IEEE-cowlishaw-arith16.pdf (needs a pdf-viewer)
Decimal Floating-Point: Algorism for Computers, Michael F. Cowlishaw. A proposed implementation of IEEE-854.
en.wikipedia.org/wiki/IEEE_floating-point_standard
Wikipedia overview.

### History and Background information

www.cs.berkeley.edu/~wkahan/ieee754status/754story.html
"An Interview with the Old Man of Floating-Point" --- some history from William Kahan local copy: WKahan.754story.html
David Goldberg: "What Every Computer Scientist Should Know About Floating-Point Arithmetic"
A survey of formats and implementations of floating-point numbers (includes references.)
www.netbsd.org/People/Pages/ross-essays.html#ieee-754
A contrarian view (very contrarian!)
"So I conclude that 754 is a virus, infecting individual programs, and making them unable to run on non-IEEE-754 hardware."
www2.hursley.ibm.com/decimal/854mins.html
Links to "ANSI/IEEE 854 — History and Minutes".
grouper.ieee.org/groups/754/meeting-minutes/01-11-15-old.html
A glimpse at how such standards are created --- a random set of 754r meeting minutes.

### IEEE-754 Revision efforts

grouper.ieee.org/groups/754/revision.html
Official statement of purpose, meeting schedule and minutes.
en.wikipedia.org/wiki/IEEE_754r
Wikipedia entry (of course).
www.validlab.com/754R/
Balloting (from Spring 2007) and information about the draft proposal and the working group.
www.cs.berkeley.edu/~ejr/Projects/ieee754/
Some remarks about the revision effort by E. Jason Riedy (a participant?) (Fall 2007 — page no longer available?)
www.cs.berkeley.edu/~ejr/Projects/ieee754/revision.html
Working group's statement of purpose, meeting schedule and minutes. (Fall 2007 — page no longer available?)

### Other Lists of References

cch.loria.fr/documentation/IEEE754/
Many references, mostly in English.
babbage.cs.qc.edu/courses/cs341/IEEE-754references.html
A page of references, including conversion demonstrations.