3.4 Comparing strings
Before we can take a closer look at how to compare strings in Turing, we need to consider how characters are
stored in memory. All data on computers, whether it be mp3s, graphics or an essay are stored internally as
numbers using the binary system (that's where the adjective digital comes from). Electrical signals
are used to represent all those 1s and 0s. So characters are also stored as numbers. Computers
use some kind of code to represent each character. There are a number of different systems depending on
what kind of computer and languagae you are using. IBM mainframe computers use a system known as
EBCDIC. Turing and most personal computers use ASCII. Recently a system known as Unicode is being
more widely used, most notably by the Java programming language. Let's look at the ASCII system since
that's what we'll see in Turing.
Here is a table that summarizes the ASCII codes for the characters you can type on the keyboard.
0 1 2 3 4 5 6 7 8 9
30 | ! " # $ % & '
40 | ( ) * + , - . / 0 1
50 | 2 3 4 5 6 7 8 9 : ;
60 | < = > ? @ A B C D E
70 | F G H I J K L M N O
80 | P Q R S T U V W X Y
90 | Z [ \ ] ^ _ ` a b c
100 | d e f g h i j k l m
110 | n o p q r s t u v w
120 | x y z { | } ~
The first character actually in the table is code 32 which is a space character. Codes 30 and 31 are for
characters the keyboard doesn't generate. The last code is 126. The last 3 in the chart are characters
they keyboard doesn't generate. As some examples of reading this table, the letter 'A' has a code of 65 and
the letter 'a' has a code of 97. Notice the codes for the upper case letters are all smaller than the codes for
the lower case letters. Notice also that the codes for the digits are consecutive and come before both the
lower and upper case letters. This will help us figure out how to order strings.
You do not need to memorize these codes. You just need to know some basic properties about the code as mentioned above.
If you need to know a particular code you can look it up. You can also use some Turing functions.
ord
The ord function will tell you the ASCII value for any character or a string of length 1.
- ord ('B') returns 66
- ord ("9") returns 57
- ord ("abc") gives an error
chr
The chr function will tell you the character that is represented by a particular code. ASCII values are only defined
from 0 to 255. If you pass chr an int less than 0 or greater than 255 you will get an error.
- chr(67) returns 'C'
- chr(56) returns '8'
- chr(ord('E')) returns 'E'
- ord(chr(90)) returns 90
Now we can look at how strings are ordered. The same approach is used that would be used to find a word in a
dictionary with one exception. Upper case and lower case letters are different and upper case letters come first. The
ASCII code is used to determine the order of characters. It is often a good idea to convert strings to all upper case
or all lower case before comparing them. We'll look at how to do this a bit later in this section.
- "first" < "frost" - the first character in each string is the same so we compare the next character. The letter
'i' comes before the letter 'r' so "first" comes before "frost".
- "the" < "them" - "the" starts with the same letters as "them" but it is shorter so it comes first
- "abc" > "ABC" - lower case letters come after uppercase letters so "abc" comes after "ABC"
- "2" > "10" - since the character '2' comes after the character '1'
- "$4.25" < "(4.25)" - since the character '$' comes before the character '('
To convert strings strings to all lower case letters we can use the approach in the following program.
var s : string := "Caitlin Smith"
var t : string := "caitlin jones"
if s < t then
put s, " comes before ", t
else
put t, " comes before ", s
end if
% convert s to all lower case
s := Str.Lower(s)
put "s is now -> ", s
if s < t then
put s, " comes before ", t
else
put t, " comes before ", s
end if
|
|
The program produces this output:
Caitlin Smith comes before caitlin jones
s is now -> caitlin smith
caitlin jones comes before caitlin smith
|
The function Str.Lower takes a string as a parameter. It returns the same string as the parameter except that any upper case letters will be converted to lower case. There is a similar function Str.Upper that returns a string with all lower case letters converted to upper case. Note that the "S" in Str and the "L" in Lower are capitalized. All the previous keywords we've used have been all lower case.
Here is an example using both:
var s : string := "abcDEF123!$&UVWxyz"
var s2, s3 :string
s2 := Str.Upper(s)
put "s= ", s, " and s2 = ", s2
s3 := Str.Lower(s)
put "s= ", s, " and s3 = ", s3
|
|
Here is the output:
s= abcDEF123!$&UVWxyz and s2 = ABCDEF123!$&UVWXYZ
s= abcDEF123!$&UVWxyz and s3 = abcdef123!$&uvwxyz
|
Exercise 3.4
- State which of the following comparisons are true.
- "Robert" <= "Rob"
- "Thomas" > "magnum"
- "27" > "134"
|
- "2 * 3" <= "2 + 3"
- "#$!*&" < "$@?!%"
- "principal" < "principle"
|
- State the output of the following statements. If any will generate errors state why.
- put ord('Z')
- put ord("#")
- put ord (82)
|
- put chr(-10)
- put chr(82)
- put chr(ord(chr(65)))
|
- Write a program, that reads a string. It reverses the positions of the characters in the string
and then prints it. For example, if the string starts with the value "ping pong", it will end
having the value "gnop gnip".
- Write a program that reads a string and prints it. Convert the string to upper case
and print it.
|
|