At the beginning, please allow LZ to make a few words. I have been writing a series of articles on the principles of computer systems recently, and I have made up my mind to finish the content of this book. The main purpose is actually to consolidate LZ’s understanding. In addition, I also want to share these contents with ape friends. After all, LZ feels that these contents have a great potential to improve the strength of programmers.
It’s just that this kind of principled article is relatively complicated and cumbersome to write, and it is more difficult to compare it, because the article is full of various mathematical symbols, but relative to such writing difficulty, its popularity is not. Far less than some less difficult essays. This can be clearly seen from LZ’s blog. Almost all of the top articles on LZ’s blog are some miscellaneous stories written by LZ, such as experiences, suggestions, insights and so on.
However, LZ also understands this phenomenon very well. After all, the essays don’t seem to require much brainstorming, and the content is relatively interesting, and there may be occasional unexpected big gains. It is understandable that they are popular. However, for articles such as computer system principles, if you can keep reading, there should be a lot of gains.
In addition, LZ also hopes that while watching the ape friends, you may wish to give LZ some encouragement and support, so that not only will LZ’s motivation greatly increase, but also a greater sense of responsibility due to the encouragement of the ape friends. The content is simpler to explain clearly.
This is the end of the nonsense. If I continue to write it, it is estimated that some ape friends can’t help but complain about LZ’s nonsense. Stop here, in fact, after talking so much, LZ just wanted to say five words, “Let’s make a recommendation.”
In the previous chapter, we focused on the representation of integers, namely unsigned encoding and two’s complement encoding. This time, let’s take a look at the expansion and truncation of binary integers. This part of the content is introduced in conjunction with the C language. So let’s first briefly look at signed and unsigned numbers in C language.
Signed and unsigned numbers in C language
The essential difference between signed numbers and unsigned numbers is that the encoding used is different. The former uses complement encoding and the latter uses unsigned encoding.
In the C language, signed and unsigned numbers are implicitly convertible, and there is no need to manually implement casts. But it is precisely because of this that you may accidentally assign an unsigned number to a signed number, which will cause unexpected results, as shown below.
short i = -12345;
unsigned short u = i;
By accident, a negative number can become a positive number. Look at the following program, which shows the unconventional results caused by the implicit conversion of signed and unsigned numbers when performing relational operations.
printf(“%dn”,-1 < 0U);
printf(“%dn”,-12345 < 12345U);
It can be seen that both results are 0, that is, false, which is contrary to our intuitive understanding. The reason is that in the process of comparison, signed numbers are implicitly converted to unsigned numbers for comparison.
When we convert a short integer variable into an integer variable, bit expansion is involved, which is expanded from two bytes to four bytes.
When performing bit expansion, the easiest thing to think of is to add all 0s to the high bits, that is, to add several 0s in front of the original binary sequence, also known as zero extension. There is also a special way, which is sign extension, which is the way for signed numbers. It directly expands the sign bit, that is, adds several highest bits to the front of the binary sequence.
For zero extension, it is obvious that the expanded value is equal to the original value, and for sign extension, it is the same, but there is no intuition from zero extension. We have a relatively simple way to calculate the complement, that is, if the sign bit is 0, it is similar to unsigned. If the sign bit is 1, that is, a negative number, you can invert the remaining bits and finally add 1. Therefore, when we sign extend a signed negative number, several 1s are added in front, and they are all 0 after inversion, so the original value will still be maintained.
In short, when the bit is extended, the original value will not be changed.
In the book, a proof of this process is also given for the sign extension of negative numbers. LZ will not describe it here. In fact, this proof is very simple, just using the formula of complement coding. It needs to be mentioned that the induction method is used to prove here, so only one bit is expanded here, and the specific process is as follows.
Truncation is the opposite of expansion. It truncates a multi-bit binary sequence to fewer digits, which is the opposite of expansion.
According to our intuitive judgment, it is not difficult to find that truncation may lead to distortion of data. For unsigned encoding, the truncated unsigned encoded value is the remaining number of bits. The proof of this simple process is given in the book, which mainly wants to show that the relationship between the values before and after truncation is obtained by modulo.
For complement encoding, the truncated binary sequence is the same as unsigned encoding, so we only need to add one more step to convert unsigned encoding to complement encoding.
Therefore, for unsigned encoding and complement, the following two formulas can be obtained.
Signed vs Unsigned in Other Languages
It is not difficult to see from the above analysis that languages with signed and unsigned numbers may cause some unnecessary troubles, and unsigned numbers do not seem to have much benefit except that the maximum value that can be represented is larger. Therefore, many languages do not support unsigned numbers.
For example, the Java language used by LZ has only signed numbers, which saves a lot of unnecessary trouble. Unsigned numbers are often just used to represent some non-numerical identifiers, such as our memory address. At this time, unsigned numbers are a bit similar to the concept of database primary keys or key values in key-value pairs, just an identifier. .
This article mainly expounds the signed and unsigned numbers in C language, as well as the extension from low to high, and the truncation of high to low. In the next chapter, we will explain a very important section, the binary operation of integers.