computer science is
2.2.2. Types
A variable’s type (or data type) is the characterization of the data that it represents. Asmentioned before, a computer only “speaks” in 0s and 1s (binary). A variable is merelya memory location in which a series of 0s and 1s is stored. That binary string couldrepresent a number (either an integer or a floating point number), a single alphanumericcharacter or series of characters (string), a Boolean type or some other, more complexuser-defined type.
The type of a variable is important because it affects how the raw binary data storedat a memory location is interpreted. Moreover, some types take a different amount ofmemory to store. For example, an integer type could take 32 bits while a floating pointtype could take 64 bits. Programming languages may support different types and maydo so in different ways. In the next few sections we’ll describe some common types thatare supported by many languages.
These are called plurale tantum (nouns with no singular form) and singular tantum (nouns with noplural form) for you grammarians. Words like “sheep” are unchanging irregular plurals; words whosesingular and plural forms are the same.
Numeric Types
At their most basic, computers are number crunching machines. Thus, the most basictype of variable that can be used in a computer program is a numeric type. There areseveral numeric types that are supported by various programming languages. The mostsimple is an integer type which can represent whole numbers 0, 1, 2, etc. and theirnegations, −1, −2, . . .. Floating point numeric types represent decimal numbers suchas 0.5, 3.14, 4.0, etc. However, neither integer nor floating point numbers can representevery possible number since they use a finite number of bits to represent the number. Wewill examine this in detail below. For now, let’s understand how a computer representsboth integers and floating point numbers in memory.
As humans, we “think” in base-10 (decimal) because we have 10 fingers and 10 toes.When we write a number with multiple digits in base-10 we do so using “places” (onesplace, tens place, hundreds place, etc.). Mathematically, a number in base-10 can bebroken down into powers of ten; for example:
In binary, numbers are represented in the same way, but in base-2 in which we only have0 and 1 as symbols. To illustrate, let’s consider counting from 0: in base-10, we wouldcount 0, 1, 2, . . . , 9 at which point we “carry-over” a 1 to the tens spot and start over at0 in the ones spot, giving us 10, 11, 12, . . . , 19 and repeat the carry-over to 20.
With only two symbols, the carry-over occurs much more frequently, we count 0, 1 andthen carry over and have 10. It is important to understand, this is not “ten”: we arecounting in base-2, so 10 is actually equivalent to 2 in base-10. Continuing, we have 11and again carry over, but we carry it over twice giving us 100 (just like we’d carry overtwice when going from 99 to 100 in base-10). A full count from 0 to 16 in binary can befound in Table 2.1. In many programming languages, a prefix of 0b is used to denote anumber represented in binary. We use this convention in the table.
As a fuller example, consider again the number 3,201. This can be represented in binary
Representing negative numbers is a bit more complicated and is usually done using a scheme calledBase-10 Binarytwo’s complement. We omit the details, but essen0 0b0tially the first bit in the representation serves as a1 0b1sign bit: zero indicates positive, while 1 indicates2 0b10negative. Negative values are represented as a com3 0b11plement with respect to 2n(a complement is where4 0b1000s and 1s are “flipped” to 1s and 0s).
Some programming languages allow you to define variables that are unsigned in whichthe sign bit is not used to indicate positive/negative. With the extra bit we can representnumbers twice as big; using n bits we can represent numbers x in the range
Floating point numbers in binary are represented in a manner similar to scientific notation.Recall that in scientific notation, a number is normalized by multiplying it by some powerof 10 so that it its most significant digit is between 1 and 9. The resulting normalizednumber is called the significand while the power of ten that the number was scaled by iscalled the exponent (and since we are base-10, 10 is the base). In general, a number inscientific notation is represented as:
In binary, the significand is often referred to as a mantissa. We also normalize a binaryfloating point number so that the mantissa is between 12and 1. This is where the termfloating point comes from: the decimal point (more generally called a radix point) “floats”left and right to ensure that the number is always normalized. The example above wouldnormalized to
Most modern programming languages implement floating point numbers according to theInstitute of Electrical and Electronics Engineers (IEEE) 754 Standard [20] (also calledthe International Electrotechnical Commission (IEC) 60559 [19]). When representedin binary, a fixed number of bits must be used to represent the sign, mantissa andexponent. The standard defines several precisions that each use a fixed number of bitswith a resulting number of significant digits (base-10) of precision. Table 2.3 contains asummary of a few of the most commonly implemented precisions.
Just as with integers, the finite precision of floating point numbers results in several limitations. First, irrational numbers such as Ï€ = 3.14159 . . . can only be approximated outto a certain number of digits. For example, with single precision Ï€ ≈ 3.1415927 which isaccurate only to the 6th decimal place and with double precision, Ï€ ≈ 3.1415926535897931approximate to only 15 decimal places.3In fact, regardless of how many bits we allow inour representation, an irrational number like Ï€ (that never repeats and never terminates)will only ever be an approximation. Real numbers like Ï€ require an infinite precision,but computers are only finite machines.
Even numbers that have a finite representation (rational numbers) such as 13 = 0.333 arenot represented exactly when using floating point numbers. In double precision binary,13= 0b1.0101010101010101010101010101010101010101010101010101 × 2−2which when represented in scientific notation in decimal is
That is, there are only 16 digits of precision, after which the remaining (infinite) sequenceof 3s get cut off.
Programming languages usually only support the common single and double precisionsdefined by the IEEE 754 standard as those are commonly supported by hardware.3The first 80 digits of π are3.14159265358979323846264338327950288419716939937510582097494459230781640628620899though only 39 digits of π are required to accurately calculate the volume of the known universe towithin one atom.
However, there are languages that support arbitrary precision (also called multiprecision)numbers and yet other languages that have many libraries to support “big number”arithmetic. Arbitrary precision is still not infinite: instead, as more digits are needed,more memory is allocated. If you want to compute 10 more digits of Ï€, you can but at acost. To support the additional digits, more memory is allocated. Also, operations areperformed in software using many operations which can be much slower than performingfixed-precision arithmetic directly in hardware. Still, there are many applications wheresuch accuracy or large numbers are absolutely essential.
Characters & Strings
Another type of data is textual data which can either be single characters or a sequenceof characters which are called strings. Strings are sometimes used for human readabledata such as messages or output, but may also model general data. For example, DNAis usually encoded using strings consisting of the characters C, G, A, T (correspondingto the nucleases cytosine, guanine, adenine, and thymine). Numerical characters andpunctuation can also be used in strings in which case they do not represent numbers,but instead may represent textual versions of numerical data.
Different programming languages implement characters and strings in different ways (ormay even treat them the same). Some languages implement strings by defining arraysof characters. Other languages may treat strings as dynamic data types. However, alllanguages use some form of character encoding to represent strings. Recall that computersonly speak in binary: 0s and 1s. To represent a character like the capital letter “A”, thebinary sequence 0b1000001 is used. In fact, the most common alphanumeric charactersare encoded according to the American Standard Code for Information Interchange(ASCII) text standard. The basic ASCII text standard assigns characters to the decimalvalues 0–127 using 7 bits to encode each character as a number. Table 2.4 contains acomplete listing of standard ASCII character set.
The ASCII table was designed to enforce a lexicographic ordering: letters are in alphabeticorder, uppercase precede lowercase versions, and numbers precede both. This designallows for an easy and natural comparison among strings, “alpha” would come before“beta” because they differ in the first letter. The characters have numerical values 97and 98 respectively; since 97 < 98, the order follows. Likewise, “Alpha” would comebefore “alpha” (since 65 < 97), and “alpha” would come before “alphanumeric”: thesixth character is empty in the first string (usually treated as the null character withvalue 0) while it is “n” in the second (value of 110). This is the ordering that we wouldexpect in a dictionary.
There are several other nice design features built into the ASCII table. For example, toconvert between uppercase and lowercase versions, you only need to “flip” the secondbit (0 for uppercase, 1 for lowercase). There are also several special characters that..
ASCII Character Table. The first and second column indicate the binary anddecimal representation respectively. The third column visualizes the resultingcharacter when possible. Characters 0–31 and 127 are control characters thatare not printable or print whitespace. The encoding is designed to impose alexicographic ordering: A–Z are in order, uppercase letters precede lowercaseletters, numbers precede letters and are also in order.
need to be escaped to be defined. For example, though your keyboard has a tab and anenter key, if you wanted to code those characters, you would need to specify them insome way other than using those keys (since typing those keys will affect what you aretyping rather than specifying a character). The standard way to escape characters is touse a backslash along with another, single character. The three most common are the(horizontal) tab, \t, the endline character, \n, and the null terminating character, \0.The tab and endline character are used to specify their whitespace characters respectively.The null character is used in some languages to denote the end of a string and is notprintable.
ASCII is quite old, originally developed in the early sixties. President Johnson firstmandated that all computers purchased by the federal government support ASCII in 1968.However, it is quite limited with only 128 possible characters. Since then, additionalextensions have been developed. The Extended ASCII character set adds support for128 additional characters (numbered 128 through 255) by adding 1 more bit (8 total).Included in the extension are support for common international characters with diacriticssuch as ¨u, ~n and £ (which are characters 129, 164, and 156 respectively).
Even 256 possible characters are not enough to represent the wide array of internationalcharacters when you consider languages like Chinese Japanese Korean (CJK). Unicodewas developed to solve this problem by establishing a standard encoding that supports1,112,064 possible characters, though only a fraction of these are actually currentlyassigned.4 Unicode is backward compatible, so it works with plain ASCII characters. Infact, the most common encoding for Unicode, UTF-8 uses a variable number of bytes toencode characters. 1-byte encodings correspond to plain ASCII, there are also 2, 3, and4-byte encodings.
In most programming languages, strings literals are defined by using either single ordouble quotes to indicate where the string begins and ends. For example, one may beable to define the string "Hello World" . The double quotes are not part of the string,but instead specify where the string begins and ends. Some languages allow you to useeither single or double quotes. PHP for example would allow you to also define the samestring as 'Hello World' . Yet other languages, such as C distinguish the usage of singleand double quotes: single quotes are for single characters such as 'A' or '\n' whiledouble quotes are used for full strings such as "Hello World" .
In any case, if you want a single or double quote to appear in your string you need toescape it similar to how the tab and endline characters are escaped. For example, in C'\'' would refer to the single quote character and "Dwayne \"The Rock\" Johnson"would allow you to use double quotes within a string. In our pseudocode we’ll use thestylized double quotes, “Hello World” in any strings that we define. We will examine
As of 2012, 110,182 are assigned to characters, 137,468 are reserved for private use (they are validcharacters, but not defined so that organizations can use them for their own purposes), with 2,048surrogates and 66 non-character control codes. 864,348 are left unassigned meaning that we arewell-prepared for encoding alien languages when they finally get here.
0 Comments