3.4 Comparing strings

Before we can take a closer look at how to compare strings in Turing, we need to consider how characters are stored in memory. All data on computers, whether it be mp3s, graphics or an essay are stored internally as numbers using the binary system (that's where the adjective digital comes from). Electrical signals are used to represent all those 1s and 0s. So characters are also stored as numbers. Computers use some kind of code to represent each character. There are a number of different systems depending on what kind of computer and languagae you are using. IBM mainframe computers use a system known as EBCDIC. Turing and most personal computers use ASCII. Recently a system known as Unicode is being more widely used, most notably by the Java programming language. Let's look at the ASCII system since that's what we'll see in Turing.

Here is a table that summarizes the ASCII codes for the characters you can type on the keyboard.

      0  1  2  3  4  5  6  7  8  9
30  |          !  "  #  $  %  &  '
40  | (  )  *  +  ,  -  .  /  0  1
50  | 2  3  4  5  6  7  8  9  :  ;
60  | <  =  >  ?  @  A  B  C  D  E
70  | F  G  H  I  J  K  L  M  N  O
80  | P  Q  R  S  T  U  V  W  X  Y
90  | Z  [  \  ]  ^  _  `  a  b  c
100 | d  e  f  g  h  i  j  k  l  m
110 | n  o  p  q  r  s  t  u  v  w
120 | x  y  z  {  |  }  ~
The first character actually in the table is code 32 which is a space character. Codes 30 and 31 are for characters the keyboard doesn't generate. The last code is 126. The last 3 in the chart are characters they keyboard doesn't generate. As some examples of reading this table, the letter 'A' has a code of 65 and the letter 'a' has a code of 97. Notice the codes for the upper case letters are all smaller than the codes for the lower case letters. Notice also that the codes for the digits are consecutive and come before both the lower and upper case letters. This will help us figure out how to order strings.

You do not need to memorize these codes. You just need to know some basic properties about the code as mentioned above. If you need to know a particular code you can look it up. You can also use some Turing functions.

ord

The ord function will tell you the ASCII value for any character or a string of length 1.

chr

The chr function will tell you the character that is represented by a particular code. ASCII values are only defined from 0 to 255. If you pass chr an int less than 0 or greater than 255 you will get an error.

Now we can look at how strings are ordered. The same approach is used that would be used to find a word in a dictionary with one exception. Upper case and lower case letters are different and upper case letters come first. The ASCII code is used to determine the order of characters. It is often a good idea to convert strings to all upper case or all lower case before comparing them. We'll look at how to do this a bit later in this section.

To convert strings strings to all lower case letters we can use the approach in the following program.

var s : string := "Caitlin Smith"
var t : string := "caitlin jones"

if s < t then
   put s, " comes before ", t
else
   put t, " comes before ", s
end if

% convert s to all lower case
s := Str.Lower(s)

put "s is now -> ", s

if s < t then
   put s, " comes before ", t
else
   put t, " comes before ", s
end if  

The program produces this output:

Caitlin Smith comes before caitlin jones
s is now -> caitlin smith
caitlin jones comes before caitlin smith

The function Str.Lower takes a string as a parameter. It returns the same string as the parameter except that any upper case letters will be converted to lower case. There is a similar function Str.Upper that returns a string with all lower case letters converted to upper case. Note that the "S" in Str and the "L" in Lower are capitalized. All the previous keywords we've used have been all lower case. Here is an example using both:

var s : string := "abcDEF123!$&UVWxyz"
var s2, s3 :string
s2 := Str.Upper(s)
put "s= ", s, " and s2 = ", s2
s3 := Str.Lower(s)
put "s= ", s, " and s3 = ", s3

Here is the output:

s= abcDEF123!$&UVWxyz and s2 = ABCDEF123!$&UVWXYZ
s= abcDEF123!$&UVWxyz and s3 = abcdef123!$&uvwxyz

Exercise 3.4

  1. State which of the following comparisons are true.
    1. "Robert" <= "Rob"
    2. "Thomas" > "magnum"
    3. "27" > "134"
    1. "2 * 3" <= "2 + 3"
    2. "#$!*&" < "$@?!%"
    3. "principal" < "principle"
  2. State the output of the following statements. If any will generate errors state why.
    1. put ord('Z')
    2. put ord("#")
    3. put ord (82)
    1. put chr(-10)
    2. put chr(82)
    3. put chr(ord(chr(65)))

  3. Write a program, that reads a string. It reverses the positions of the characters in the string and then prints it. For example, if the string starts with the value "ping pong", it will end having the value "gnop gnip".
  4. Write a program that reads a string and prints it. Convert the string to upper case and print it.