Lesson 8 - Strings in Python - Working with single characters

Python Basics Strings in Python - Working with single characters

In the last lesson, Lists in Python, we learned to work with lists. If you noticed some similarities between lists and strings, then you were absolutely onto something. For the rest of you, it may come as a surprise that a string is essentially a sequence of characters and we can work with it like so.

First, we'll check out how it works by simply printing the character at the given positions:

s = "Hello ICT.social"

The output:

Console application
Hello ICT.social

As you can see, we can access the characters of a string through the brackets like we do with lists. Keep in mind that characters at given positions are read-only in Python. For example, we can't write the following:

# This code doesn't work
s = "Hello ICT.social"
s[1] = "o"

However, there is a simple workaround to this by converting the string to a list. We can act the same way with characters as we learned to do with list items. Then, we'd get our string back using the join() method.

s = list("Hello ICT.social")
s[1] = "o"
s = "".join(s)

Notice that we call the join() method on an empty string. We can also specify any other character as the separator (what goes between each character). The output:

Console application
['H', 'e', 'l', 'l', 'o', ' ', 'I', 'C', 'T', '.', 's', 'o', 'c', 'i', 'a', 'l']
Hollo ICT.social

Character occurrence in a sentence analysis

Let's write a simple program that analyzes a given sentence for us. We'll search for the number of vowels, consonants, and non-alphanumeric characters (e.g. space or !).

We'll hard-code the input string in our code so we won't have to write it again every time. Once the program is complete, we'll replace the string with input(""). We'll iterate over the characters using a loop. By the way, we won't focus as much on program speed here (we'll choose practical and simple solutions).

First, let's define vowels and consonants. We don't have to count non-alphanumeric characters since they'll be the string length minus the number of vowels and consonants. Since we don't want to deal with letter cases, uppercase/lower­case, we'll convert the entire string to lowercase at the start. Let's set up variables for the individual counters, also, since the code is a bit more complex, we'll add in comments.

# the string that we want to analyze
s = "A programmer gets stuck in the shower because the instructions on the shampoo were: Lather, Wash, and Repeat."
s = s.lower()

# Counters initialization
vowels_count = 0
consonants_count = 0

# definition of character groups
vowels = "aeiouy"
consonants = "bcdfghjklmnpqrstvwxz"

# the main loop
for char in s:

First of all, we prepare the string and convert it to lowercase. Then, we reset the counters. For the definition of characters groups, we only need ordinary strings. The main loop iterates over each character in the string s, so in each iteration of the loop the variable char will contain the current character.

Now let's increment the counters. For simplicity's sake, I'll focus on the loop instead of rewriting the code over and over again:

# the main loop
for char in s:
        if char in vowels:
                vowels_count += 1
        elif char in consonants:
                consonants_count += 1

The in operator is already known to us. First of all, we try to find the character char from our sentence in the vowels string and possibly increase their counter. If it's not included in the vowels, we look for it in the consonants and possibly increase their counter.

Now, all we're missing is the printing, displaying text, part at the end:

print("Vowels: %d" %(vowels_count))
print("Consonants: %d" % (consonants_count))
print("Non-alphanumeric characters: %d" % (len(s) - (vowels_count + consonants_count)))

Console application
A programmer gets stuck in the shower because the instructions on the shampoo were: Lather, Wash, and Repeat.
Vowels: 33
Consonants: 55
Non-alphanumeric characters: 21

That's it, we're done!

The ASCII value

Perhaps you've already heard of the ASCII table. Especially, in the MS-DOS era when there was practically no other way to store text. Individual characters were stored as numbers of a range from 0 to 255. The system provided the ASCII table which had 256 characters and each ASCII code (numerical code) was assigned to one character.

Hopefully, you understand why this method is no longer as relevant. The table simply could not contain all the characters of all international alphabets. Now, we use Unicode (UTF-8) encoding where characters are represented in a different way (this is set as default in Python 3, but not in Python 2). In Python, we have the option to work with ASCII values for individual characters. The main advantage to this is that the characters are stored in a table next to each other, alphabetically. For example, at position 97 we can find "a", at 98 "b", etc. It is the same with numbers, but unfortunately, the accent characters are messed up.

Now, let's convert a character into its ASCII value and vice versa create the character according to its ASCII value:

# conversion from text to ASCII value
c = "a" # character
i = ord(c) # ordinal (ASCII) value of the character
print("The character %s was converted to its ASCII value of %d" %(c, i))
# conversion from an ASCII value to text
i = 98
c = chr(i)
print("The ASCII value of %s was converted to its textual value of %d" % (c, i))

We use the ord() function to get the ordinal (ASCII) value of a character and the chr() function to get the character from its ordinal value.

The Caesar cipher

Let's create a simple program to encrypt text. If you've ever heard of the Caesar cipher, then you already know exactly what we're going to program. The text encryption is based on shifting characters in the alphabet by a certain fixed number of characters. For example, if we shift the word "hello" by 1 character forwards, we'd get "ifmmp". The user will be allowed to select the number of character shifts.

Let's get right into it! We need variables for the original text, the encrypted message, and the shift. Then, we need a loop iterating over each character and printing an encrypted message. We'll also have to hard-code the message defined in the code, so we won't have to write it over and over during the testing phase. After we finish the program, we'll replace the contents of the variable with the input() function. The cipher doesn't work with accent characters, spaces and punctuation marks. We'll just assume the user will not enter them. Ideally, we should remove accent characters before encryption, as well as anything except letters.

# variable initialization
s = "blackholesarewheregoddividedbyzero"
print("Original message: %s" % (s))
message = ""
shift = 1

# loop iterating over characters
for char in s:

# printing
print("Encrypted message: %s" % (message))

Now, let's move into the loop, we'll convert the character char to its ASCII value, its ordinal value, increase the value by however many shifts and convert it back to the character. This character will be added to the final message:

        i = ord(char)
        i += shift
        character = chr(i)
        message += character

Console application
Original message: blackholesarewheregoddividedbyzero
Encrypted message: cmbdlipmftbsfxifsfhpeejwjefecz{fsp

Let's try it out! The result looks pretty good. However, we can see that the characters after "z" overflow to ASCII values of other characters ("{" in the picture). Therefore, the characters are no longer just alphanumeric, but other nasty characters. Let's set our characters up as a cyclical pattern, so the shifting could flow smoothly from "z" to "a" and so on. We'll get by with a simple condition that decreases the ASCII value by the length of the alphabet so we'd end back up at "a".

i = ord(char)
i += shift
# overflow control
if i > ord("z"):
        i -= 26
character = chr(i)
message += character

If i exceeds the ASCII value of "z", we reduce it by 26 characters (the number of characters in the English alphabet). The -= operator does the same as we would do with i = i -26. It's simple and our program is now operational. Notice that we don't use direct character codes anywhere. There's an ord("z") in the condition even though we could write 122 there directly. I set it up this way so that our program is fully encapsulated from explicit ASCII values, so it'd be clearer how it works. Try to code the decryption program as practice for yourself.

In the next lesson, Strings in Python - Split, we'll see that there are still a couple more things we haven't touched base on that strings can do. Spoiler: We'll learn how to decode "Morse code".



Article has been written for you by David Capka
Do you like this article?
No one has rated this quite yet, be the first one!
The author is a programmer, who likes web technologies and being the lead/chief article writer at ICT.social. He shares his knowledge with the community and is always looking to improve. He believes that anyone can do what they set their mind to.
Unicorn College The author learned IT at the Unicorn College - a prestigious college providing education on IT and economics.
Previous article
Lists in Python
All articles in this section
Python basic constructs
Activities (5)




To maintain the quality of discussion, we only allow registered members to comment. Sign in. If you're new, Sign up, it's free.

No one has commented yet - be the first!