Integers, or whole number from elemental mathematics, are the most common and
fundamental numbers used in the computers. It’s represented as
fixedpoint numbers, contrast to floatingpoint numbers in the machine.
Today we are going to learn a whole bunch of way to encode it.
There are mainly two properties to make a integer representation different:
Size, of the number of bits used.
usually the power of 2. e.g. 8bit, 16bit, 32bit, 64bit.Signed or unsigned.
there are also multiple schemas to encode a signed integers.
We are also gonna use the below terminologies throughout the post:
 MSB: Most Significant Bit
 LSB: Least Significant Bit
Prerequisite  printf
Recap
We will quickly recap the integers subset of usages of printf
.
Basically, we used format specifier to interpolate values into strings:
Format Specifier
%[flags][width][.precision][length]specifier
specifier
d
,i
: signed decimalu
: unsigned decimalc
: charp
: pointer addrx
/X
: lower/upper unsigned hex
length
l
: long (at least 32)ll
: long long (at least 64)h
: short (usually 16)hh
: short short (usually 8)
1  using namespace std; 
inttypes.h
from C99
Also in cppreference.com
1  // signed int (d or i) 
Unsigned Integers
The conversion between unsigned integers and binaries are trivial.
Here, we can represent 8 bits (i.e. a byte) as a hex pair, e.g.255 == 0xff == 0b11111111
.
1 

Signed Integers
Signed integers are more complicated. We need to cut those bits to halves
to represent both positive and negative integers somehow.
There are four wellknown schemas to encode it, according to
signed number representation of wikipedia.
Sign magnitude 原码
It’s also called “sign and magnitude”. From the name we can see how straightforward it is:
it’s basically put one bit (often the MSB) as the sign bit to represent sign and the remaining bits indicating
the magnitude (or absolute value), e.g.
1  binary  signmagn  unsigned 
It was used in early computer (IBM 7090) and now mainly used in the
significand part in floatingpoint number
Pros:
 simple and nature for human
Cons:
 2 way to represent zeros (
+0
and0
)  not as good for machine
 add/sub/cmp require knowing the sign
 complicate CPU ALU design; potentially more cycles
 add/sub/cmp require knowing the sign
Ones’ complement 反码
It form a negative integers by applying a bitwise NOT
i.e. complement of its positive counterparts.
1  binary  1s comp  unsigned 
N.B. MSB can still be signified by MSB.
It’s referred to as ones’ complement because the negative can be formed
by subtracting the positive from ones: 1111 1111 (0)
1  1111 1111 0 
The benefits of the complement nature is that adding becomes simple,
except we need to do an endaround carry to add resulting carry
back to get the correct result.
1  0111 1111 127 
Pros:
 Arithmetics on machien are fast.
Cons:
 still 2 zeros!
Twos’ complement 补码
Most of the current architecture adopted this, including x86, MIPS, ARM, etc.
It differed with one’s complement by one.
1  binary  2s comp  unsigned 
N.B. MSB can still be signified by MSB.
It’s referred to as twos’ complement because the negative can be formed
by subtracting the positive from 2 ** N
(congruent to 0000 0000 (+0)
),
where N
is the number of bits.
E.g., for a uint8_t
, the sum of any number and it’s twos’ complement would
be 256 (1 0000 0000)
:
1  1 0000 0000 256 = 2 ** 8 
Becuase of this, arithmetics becomes really easier, for any number x
e.g. 127
we can get its twos’ complement by:
~x => 1000 0000
bitwise NOT (like ones’ complement)+1 => 1000 0001
add 1 (the one differed from ones’ complement)
Cons:
 bad named?
Pros:
 fast machine arithmatics.
 only 1 zeros!
 the minimal negative is
128
Offset binary 移码
It’s also called excessK (偏移 K) or biased representation, where K
is
the biasing value (the new 0
), e.g. in excess128:
1  binary  K = 128  unsigned 
It’s now mainly used for the exponent part of floatingpoint number.
Type Conversion & Printf
This might be a little bit off topic, but I want to note down what I observed
from experimenting. Basically, printf
would not perform an implicit type
conversion but merely interpret the bits arrangement of your arguments as you
told it.
 UB! stands for undefined behaviors
1  uint8_t u8 = 0b10000000; // 128 
Char & ASCII
Traditionally, char
is represented in the computer as 8 bits as well. And
really, ASCII is only defined between 0
and 127
and require 7 bits.
(8bit Extended ASCII is not quite well popularized and supported.)
It’s more complicated in extension such as Unicode nowadays, but we’ll ignore
it for future posts dedicated for char and string representation.
So how is a char
different with a byte?
Well, the answer is whether a char
is a signed char
(backed by int8_t
)
or a unsigned char
(backed by uint8_t
) is… implementatondefined.
And most systems made it signed since most types (e.g. int
) were signed
by default.
N.B. int
is standarddefined to be equivalent to signed int
. This is
not the case of char
.
That’s why you often see such typedef
such as:
1  typedef unsigned char Byte_t; 
to emphysize the nature of byte should be just plain, unsigned, bits.
References
 本文标题：Data Representation  Integer
 创建时间：20210328 00:00:00
 本文链接：posts/b79.html
 版权声明：本博客所有文章除特别声明外，均采用 BYNCSA 许可协议。转载请注明出处！