Lesson 9 - Strings in Python - Split

Python Basics Strings in Python - Split

In the previous tutorial, Strings in Python - Working with single characters, we made clear that Python strings are essentially sequences of characters. In today's lesson, we're going to explain the split() string method that I have intentionally kept from you because we didn't know that strings are similar to lists :)

split()

From the previous tutorial, we know that parsing strings character by character can be rather complicated. Despite the fact that we made a fairly simple example. Of course, we'll encounter strings all the time. They're present in user inputs (from the console or from input fields in form applications), and in TXT and XML files. Very often, we're given one long string, a line in a file or in the console, in which there are multiple values separated by separators, e.g. commas. In this case, we're referring to the CSV format (Comma-Separated Values). To make sure that we're all on the same page, let's look at some sample strings:

Jessie,Brown,Wall Street 10,New York,130 00
.. ... .-.. .- -. -.. ... --- ..-. -
(1,2,3;4,5,6;7,8,9)
  • The first string represents a user. We could, for example, store users into a CSV file (one per line).
  • The second string is Morse code characters and uses a space character as a separator.
  • The third string is a matrix of 3 columns and 3 rows. The column separator is a comma, whereas the row separator is a semicolon.

We can call the split() method on a string, which takes a separator character as a parameter. It'll then split the original string using the separator into a sequence of substrings and return it. Which will greatly simplify value extraction from strings for our current intents and purposes.

We're also already familiar with the join() method which is called directly on the string separator and vice versa allows us to join a sequence of substrings into a single string using a specified separator. The parameter is a sequence. The output of the method is the resulting string.

Right then, let's see what we've got up until now. We still don't know how to declare objects, users, or even work with multidimensional arrays, i.e. matrices. Nevertheless, we want to make something cool, so we'll settle for making a Morse code message decoder.

Morse code decoder

We'll start out by preparing the structure of the program, as always. We need two strings for the messages, one for a message in Morse code, the other one will be empty for now and we'll store the results of our efforts there. Next, we need letter definitions (as we had with vowels). Of course, it'll be a definition based off of the ones in Morse code. Letters can be stored into a single string since they only consist one character. Morse code characters consist of multiple characters, that we have to specify using a list.

The structure of our program should now look something like this:

# the string which we want to decode
s = ".. -.-. - ... --- -.-. .. .- .-.."
print("The original message: %s" %(s))
# a string with a decoded message
message = ""

# array definitions
alphabetChars = "abcdefghijklmnopqrstuvwxyz"
morseChars = [".-", "-...", "-.-.", "-..", ".", "..-.", "--.", "....",
"..", ".---", "-.-", ".-..", "--", "-.", "---", ".--.", "--.-", ".-.", "...", "-", "..-",
"...-", ".--", "-..-", "-.--", "--.."]

We could also add other Morse characters such as numbers and punctuation marks but won't worry about them (for now). We'll split the string s with the split() method into a sequence of substrings containing the Morse characters. We'll set the space character as the separator. Then, we'll iterate over the sequence using a for loop:

# splitting a string into Morse characters
characters = s.split(" ")

# iteration over Morse characters
for morseChar in characters:

Ideally, we should somehow deal with cases when the user enters things like multiple spaces between characters (users often do things of the sort). In this case, split() creates one more empty substring in the sequence. We would then detect it in the loop and ignore it, but we won't deal with that in this lesson.

In the loop, we'll attempt to find the current Morse character in the morseChars list. We'll be interested in its index because when we look at that same index in the alphabetChars list, there will be a corresponding letter. This is mainly because both the list and the string contain the same characters which are ordered alphabetically. Let's place the following code into the loop body:

alphabetChar = "?"
if morseChar in morseChars: # character was found
        index = morseChars.index(morseChar)
        alphabetChar = alphabetChars[index]
message += alphabetChar

First, the alphabetical character is set to "?" since it may very well be that we don't have it defined in our list. Then we try to determine its index. If it succeeds, we assign the character from alphabetic characters at its index to alphabetChar. Finally, we add the character to the message. The += operator works the same as message = message + alphabetChar.

Now, we'll print the message:

print("The decoded message: %s" % (message))

Console application
The original message: .. -.-. - ... --- -.-. .. .- .-..
The decoded message: ictsocial

Done! If you want to train some more, you may create a program which would encode a string to the Morse code. The code would be very similar. We'll use the split() and join() methods several more times throughout our courses.

Special characters and escaping

Strings can contain special characters which are prefixed with backslash "\". Mainly, the \n character, which causes a line break anywhere in the text, and \t, which is the tab character.

Let's test them out:

print("First line\nSecond line")

"\" character indicates a special character sequence in a string and can be used also e.g. to write Unicode characters as "\uxxxx" where xxxx is the character code.

The problem might be when we want to write "\" itself, in this case, we have to escape it by writing one more "\":

print("This is a backslash: \\")

We can escape a quotation mark in the same way, so Python wouldn't misinterpret it as the end of the string:

print("This is a quotation mark: \"")

We can also take an advantage from the fact Python supports both single and double quotes. If we need to write double quotes in the string, we don't have to escape it but put the string into single quotes instead:

print('Yes, that was my "lucky" day, I lost my car keys!')

When we want a string to contain line breaks, it can be useful to declare it using the triple quotes:

s = """The first line
the second line"""
print(s)

The output:

Console application
s = """The first line
the second line"""
print(s)

Inputs from the console and input fields in form applications are, of course, escaped automatically, so the user doesn't need to enter \n, \t, etc. Programmers are allowed to write these characters in the code, so we have to keep escaping in mind.

In the next lesson, we'll learn all about multidimensional arrays.


 

 

Article has been written for you by David Capka
Avatar
Do you like this article?
No one has rated this quite yet, be the first one!
The author is a programmer, who likes web technologies and being the lead/chief article writer at ICT.social. He shares his knowledge with the community and is always looking to improve. He believes that anyone can do what they set their mind to.
Unicorn College The author learned IT at the Unicorn College - a prestigious college providing education on IT and economics.
Activities (5)

 

 

Comments

To maintain the quality of discussion, we only allow registered members to comment. Sign in. If you're new, Sign up, it's free.

No one has commented yet - be the first!