Mass of an electron: 9x10e28 grams to Mass of the sun 2x10e33; range exceeds10e60
No way we can fit a range of 10e60 values in 8 bits, or even 32 bits. It would take about 200 bits.
So, we handle this in a computer the same way we do ourselves: scientific notation. Just translate the number into binary and represent it with a sign, an exponent, and a mantissa (fraction). We'll put all three parts in a single word (single precision), or in two words (double precision). The base will be fixed for all numbers, so there is no need to represent it explicitly.
The mantissa is typically represented in sign magnitude and normalized
so
that all of the positions can be used for precision.
Example using base 10:
6875 x 10e3 represented as 6.875 x 10e6 or .6875 x 10e7
The term floatingpoint is used because the position of the radix point is adjusted during computation to retain the normal form.
Example format:
 sign bit  7bit exponent  24bit fractional mantissa

Here is an intuitive way to convert a decimal value into a binary fraction.
Consider what the positions mean:
d1 d0 . d1 d2 d3 d4 d5 d6 d7 d8
2e1 2e0 . 2e1
2e2 2e3
2e4 2e5
2e6 2e7
2e8
2 1
. 1/2
1/4 1/8
1/16 1/32
1/64 1/128
1/256
. 128/256 64/256 32/256
16/256 8/256 4/256
2/256 1/256
In general, the fractional places will range from 1/2 to 1/(2eN), where N is the number of bits.
Procedure: express everything as a fraction of 2eN (here, 256), then use our conversion strategy for integers.
Convert .625: .625 = X/256; .625 * 256 = 160; .625 = 160/256.
We want to find which of the digits should be 1, such that those numerators
sum to 160.
160 = 128 + 32. So, .625 decimal = .10100000 binary.
Another example: .6875 using 4 bits.
The positions: . 1/2
1/4 1/8
1/16
. 8/16 4/16
2/16 1/16
Express .6875 as a fraction of 16: .6875 = 11/16.
So, .6875 in 4 bits is: .1011
100011.1011 = .1000111011 * 2e+6.
In our representation above:
 0  0000110  1000111011 
Obviously, there is a tradeoff between size (the more bits for the exponent, the larger the range of numbers) and accuracy (the more bits for the mantissa, the more significant digits can be represented).
In 7 bits, with an implied base of 2, we are limited to exponents between 64 to +63. This is not enough of a range.
If we can help it, we don't want to reduce the size of the mantissa field, because then we will lose accuracy.
What can we do?
Change the implied base to a larger power of 2 (4, 8, or 16, say).
Suppose the base is 16. Range roughly 10e76 to 10e76, while still using 7 bits for the mantissa.
Let's consider our example.
Unnormalized, we had 100011.1011
Normalize it:
.001000111011 * 16e2
Our floating point representation:
0  0000010  001000111011000000000000

Shifting the mantissa mantissa to perform normalization must take place in steps of 4bit shifts. A representation is now considered to be normalized if any of the leading 4 bits of its mantissa is 1. This is called Hexadecimal normalization.
So, what is the tradeoff of using the larger base? Extra range at the expense of precision. Even though 24 bits are still used for the mantissa, hex normalization allows the three leading bits of a mantissa to be 0s. So, in some cases, only 21 signficant bits are retained. equires I bits.
Another option: use base 2 and a phantom bit
If the base is 2, then the most signifiant bit of a normalized mantissa
is always 1. So, why store it? This way, you can get 24 bits
of precision in 23 bits, and use the extra bit for the exponent.
This simplifies the circuitry needed for comparing exponents.
Example: 0.00000101... * 16e3
Unnormalized:  1  0000011  00000101.... 
Normalized, base16 representation:
 1  0000010  01010000....

Represented in excess64, base16 representation:
 1  1000010  01010000...
Recall in base 10: 2.94 * 10e2 + 4.31 * 10e4 = 4.3394 * 10e4.
Add/subtract:
Multiply (Divide) :
The main criterion was to maximize precision while retaining a sufficiently large range.
Single precision (32 bits): 1 sign bit  8bit
exponent  23bit mantissa
Double precision (64 bits): 1 sign bit  11bit exponent
 52bit mantissa
Binary normalization (implied base is 2)
A phantom bit is used, so that 24 (53) bits of precision are represented
using 23 (52) bits.
The 8bit exponent is in excess127 representation, where E'=
0 and E'=255 are used to represent special values, such as exact 0 and
infinity.
E: 127 
126 127 
???
+127  +127
+127  +127
E'
0 
1 254
 255
Actually, a more precise version is the following. Recall that $FF_us = 255 and $FF_2c = 1:
E: 128
127  126
127 
+127 +127 
+127 +127 
E' 1
0 
1 254 
Where 1 ($FF) and 0 are the end values used to represent special values.