close

Вход

Забыли?

вход по аккаунту

?

Assembly Language Programming

код для вставкиСкачать
Floating Point
CPSC 252 Computer
Organization
Ellen Walker, Hiram College
Representing Non-Integers
– Often represented in decimal format
– Some require infinite digits to represent
exactly
– With a fixed number of digits (or bits),
many numbers are approximated
– Precision is a measure of the degree of
approximation
Scientific Notation (Decimal)
• Format: m.mmmm x 10^eeeee
– Normalized = exactly 1 digit before decimal point
• Mantissa (m) represents the significant digits
– Precision limited by number of digits in mantissa
• Exponent (e) represents the magnitude
– Magnitude limited by number of digits in exponent
– Exponent < 0 for numbers between 0 and 1
Scientific Notation (Binary)
• Format: 1.mmmm x 2^eeeee
– Normalized = 1 before the binary point
• Mantissa (m) represents the significant bits
– Precision limited by number of bits in mantissa
• Exponent (e) represents the magnitude
– Magnitude limited by number of bits in exponent
– Exponent < 0 for numbers between 0 and 1
Binary Examples
• 1/16
1.0 x 2^-4 (mantissa 1.0, exponent -4)
• 32.5
1.000001 x 2^5 (mantissa 1.000001,
exponent 5)
Quick Decimal-to-Binary
Conversion (Exact)
1. Multiply the number by a power of 2
big enough to get an integer
2. Convert this integer to binary
3. Place the binary point the appropriate
number of bits (based on the power of
2 from step 1) from the right of the
number
Conversion Example
•
Convert 32.5 to binary
1. Multiply 32.5 by 2 (result is 65)
2. Convert 65 to binary (result is 1000001)
3. Place the decimal point (in this case 1 bit
from the right) (result is 100000.1)
•
Convert to binary scientific notation
(result is 1.000001 x 2^5)
Floating Point Representation
•
•
•
•
Mantissa - m bits (unsigned)
Exponent - e bits (signed)
Sign (separate) - 1 bit
Total = 1+m+e bits
– Tradeoff between precision and magnitude
– Total bits fit into 1 or 2 full words
Implicit First Bit
• Remember the mantissa must always
begin with “1.”
• Therefore, we can save a bit by not
actually representing the 1 explicitly.
• Example:
– Mantissa bits 0001
– Mantissa: 1.0001
Offset Exponent
• Exponent can be positive or negative, but it’s
cleaner (for sorting) use an unsigned
representation
• Therefore, represent exponents as unsigned,
but add a bias of –((2^(bits-1))-1)
• Examples: 8 bit exponent
– 00000001 = 1(+ -127) = -126
– 10000000 = 128 (+ -127) = 1
IEEE 754 Floating Point
Representation (Single)
• Sign (1 bit), Exponent (8 bits), Magnitude (23
bits)
– What is the largest value that can be represented?
– What is the smallest positive value that can be
represented?
– How many “significant bits” can be represented?
• Values can be sorted using integer
comparison
– Sign first
– Exponent next (sorted as unsigned)
– Magnitude last (also unsigned)
Double Precision
• Floating point number takes 2 words (64
bits)
• Sign is 1 bit
• Exponent is 11 bits (vs. 8)
• Magnitude is 52 bits (vs. 23)
– Last 32 bits of magnitude is in the second
word
Floating Point Errors
• Overflow
– A positive exponent becomes too large for the
exponent field
• Underflow
– A negative exponent becomes too large for the
exponent field
• Rounding (not actually an error)
– The result of an operation has too many significant
bits for the fraction field
Special Values
• Infinity
– Result of dividing a non-zero value by 0
– Can be positive or negative
– Infinity +/- anything = Infinity
• Not A Number (NaN)
– Result of an invalid mathematical
operation, e.g. 0/0 or Infinity-Infinity
Representing Special Values
in IEEE 754
• Exponent ≠0, Exponent ≠FF
– Ordinary floating point number
• Exponent = 00, Fraction = 0
– Number is 0
• Exponent = 00, Fraction ≠0
– Number is denormalized (leading 0. Instead of 1.)
• Exponent = FF, Fraction = 0
– Infinity (+ or -, depending on sign)
• Exponent = FF, Fraction ≠0
– Not a Number (NaN)
Double Precision in MIPS
• Each even register can be considered a
register pair for double precision
– High order bit in even register
– Low order bit in odd register
Floating Point Arithmetic in
MIPS
• Add.s, add.d, sub.s, sub.d [rd] [rs] [rt]
– Single and double precision addition /
subtraction
– rd = rs +/- rt
• 32 floating point registers $f0 - $f31
– Use in pairs for double precision
– Registers for add.d (etc) must be even
numbers
Why Separate Floating Point
Registers?
• Twice as many registers using the same
number of instruction bits
• Integer & floating point operations
usually on distinct data
• Increased parallelism possible
• Customized hardware possible
Load/ Store Floading Point
Number
•
•
•
•
Lwc1 32 bit word to FP register
Swc1 FP register to 32 bit word
Ldc1 2 words to FP register pair
Sdc1 register pair to 2 words
• (Note last character is the number 1)
Floating Point Addition
• Align the binary points (make exponents
equal)
• Add the revised mantissas
• Normalize the sum
Changing Exponents for
Alignment and Normalization
• To keep the number the same:
– Left shift mantissa by 1 bit and decrement
exponent
– Right shift mantissa by one bit and increment
exponent
• Align by right-shifting smaller number
• Normalize by
– Round result to correct number of significant bits
– Shift result to put 1 before binary point
Addition Example
Add 1.101 x 2^4 + 1.101 x 2^5 (26+52)
• Align binary points
1.101 x 2^4 = 0.1101 x 2^5
• Add mantissas
0.1101 x 2^5
1.1010 x 2^5
10.0111 x 2^5
Addition Example (cont.)
• Normalize:
10.0111 x 2^5 = 1.00111 x 2^6 (78)
• Round to 3-bit mantissa:
1.00111 x 2^6 ~= 1.010 x 2^6 (80)
Rounding
• At least 1 bit beyond the last bit is needed
• Rounding up could require renormalization
– Example: 1.1111 -> 10.000
• For multiplication, 2 extra bits are needed in
case the product’s first bit is 0 and it must be
left shifted (guard, round)
• For complete generality, add “sticky bit” that
is set whenever additional bits to the right
would be >0
Round to Nearest Even
• Most common rounding mode
• If the actual value is halfway between
two values round to an even result
• Examples:
– 1.0011 -> 1.010
– 1.0101 -> 1.010
• If the sticky bit is set, round up because
the value isn’t really halfway between!
Floating point addition
Sign Exponent
•
Fraction
Sign Exponent
Fraction
1. Compare the exponents of the two numbers.
Shift the smaller number to the right until its
exponent would match the larger exponent
Small ALU
Exponent
difference
0
Start
2. Add the significands
1
0
1
0
3. Normalize the sum, either shifting right and
incrementing the exponent or shifting left
and decrementing the exponent
Shift right
Control
1
Overflow or
underflow?
Big ALU
Yes
No
0
0
1
1
4. Round the significand to the appropriate
Increment or
decrement
number of bits
Shift left or right
No
Rounding hardware
Still normalized?
Yes
Sign Exponent
Fraction
Done
Exception
Floating Point Multiplication
1. Calculate new exponent by adding
exponents together
2. Multiply the significands (using shift &
add)
3. Normalize the product
4. Round
5. Set the sign
Adding Exponents
• Remember that exponents are biased
– Adding exponents adds 2 copies of bias!
(exp1 + 127) + (exp2 + 127) =
(exp1+exp2 + 254)
• Therefore, subtract the bias from the
sum and the result is a correctly biased
value
Multiplication Example
• Convert 2.25 x 1.5 to binary floating point (3
bits exponent, 3 bits mantissa)
• 2.25 = 10.01 * 2^0 = 1.001 * 2^1
• Exp = 100 (because bias is 3)
• 2.25 = 0 100 001
• 1.5 = 1.100 * 2^0
• Exp = 011, Mantissa: 100
• 1.5 = 0 100 100
1. Add Exponents
0 100 001 x 0 011 100
• Add Exponents (and subtract bias)
100 + 011 – 011 = 100
2. Multiply Significands
0 100 001 x 0 011 100
• Remember to restore the leading 1
• Remember that the number of binary places
doubles
1.001
1.100
-----------------------.100100
1.001000
---------------1.101100 x 2^1
Finish Up
•
•
•
•
•
•
Product is 1.1011 * 2^1
Already normalized
But, too many bits, so we need to round
Nearest even number (up) is 1.110
Result: 0 100 110
Value is 1.75 * 2 = 3.5
Types of Errors
• Overflow
• Exponent too large or small for the number
of bits allotted
• Underflow
• Negative exponent is too small to fit in the
# bits
• Rounding error
• Mantissa has too many bits
Overflow and Underflow
• Addition
– Overflow is possible when adding two positive or
two negative numbers
• Multiplication
– Overflow is possible when multiplying two large
absolute value numbers
– Underflow is possible when multiplying two
numbers very close to 0
Limitations of Finite Floating
Point Representations
• Gap between 0 and the smallest nonzero number
• Gaps between values when the last bit of
the mantissa changes
• Fixed number of values between 0 and 1
• Significant effects of rounding in
mathematical operations
Implications for Programmers
• Mathematical rules are not always followed
– (a / b) * b does not always equal a
– (a + b) + c does not always equal a + (b + c)
• Use inequality comparisons instead of directly
comparing floating point numbers (with ==)
– if ((x > –epsilon) && (x < epsilon)) instead of
if(x==0)
– Epsilon can be set based on problem or
knowledge of representation (e.g. single vs.
double precision)
The Pentium Floating Point
Bug
• To speed up division, a table was used
• It was assumed that 5 elements of the table
would never be accessed (and the hardware
was optimized to make them 0)
• These table elements occasionally caused
errors in bits 12 to 52 of floating point
significands
• (see Section 3.8 for more)
A Marketing Error
• July 1994 - Intel discovers the bug, decides
not to halt production or recall chips
• September 1994 - A professor discovers the
bug, posts to Internet (after attempting to
inform Intel)
• November 1994 - Press articles, Intel says
will affect “maybe several dozen people”
• December 1994 - IBM disputes claim and
halts shipment of Pentium based PCs.
• Late December 1994 - Intel apologizes
The “Big Picture”
• Bits in memory have no inherent meaning. A
given sequence can contain
–
–
–
–
An instruction
An integer
A string of characters
A floating point number
• All number representations are finite
• Finite arithmetic requires compromises
Документ
Категория
Презентации
Просмотров
6
Размер файла
230 Кб
Теги
1/--страниц
Пожаловаться на содержимое документа