How can we rewrite an immutable string? We can't change individual characters inside a string:
>>> title = "Recipe 5: Rewriting, and the Immutable String">>> title[8]= ''Traceback (most recent call last): File "<stdin>", line 1, in <module>TypeError: 'str' object does not support item assignment
Since this doesn't work, how do we make a change to a string?
Let's assume we have a string like this:
>>> title = "Recipe 5: Rewriting, and the Immutable String"
We'd like to do two transformations:
- Remove the part before the
:
- Replace the punctuation with
_
, and make all the characters lowercase
Since we can't replace characters in a string object, we have to work out some alternatives. There are several common things we can do, shown as follows:
- A combination of slicing and concatenating a string to create a new string.
- When shortening, we often use the
partition()
method. - We can replace a character or a substring with the
replace()
method. - We can expand the string into a list of characters, then join the string back into a single string again. This is the subject for a separate recipe, Building complex strings with a list of characters.
Since we can't update a string in place, we have to replace the string variable's object with each modified result. We'll use a statement that looks like this:
some_string = some_string.method()
Or we could even use:
some_string = some_string[:chop_here]
We'll look at a number of specific variations on this general theme. We'll slice a piece of a string, we'll replace individual characters within a string, and we'll apply blanket transformations such as making the string lowercase. We'll also look at ways to remove extra _
that show up in our final string.
Here's how we can shorten a string via slicing:
- Find the boundary:
>>> colon_position = title.index(':')
The index function locates a particular substring and returns the position where that substring can be found. If the substring doesn't exist, it raises an exception. This is always true
of the result title[colon_position] == ':'
.
- Pick the substring:
>>> discard_text, post_colon_text = title[:colon_position], title[colon_position+1:]>>> discard_text'Recipe 5'>>> post_colon_text' Rewriting, and the Immutable String'
We've used the slicing notation to show the start:end
of the characters to pick. We also used multiple assignment to assign two variables, discard_text
and post_colon_text
, from two expressions.
We can use partition()
as well as manual slicing. Find the boundary and partition:
>>> pre_colon_text, _, post_colon_text = title.partition(':')>>> pre_colon_text'Recipe 5'>>> post_colon_text' Rewriting, and the Immutable String'
The partition
function returns three things: the part before the target, the target, and the part after the target. We used multiple assignment to assign each object to a different variable. We assigned the target to a variable named _
because we're going to ignore that part of the result. This is a common idiom for places where we must provide a variable, but we don't care about using the object.
We can use replace()
to remove punctuation marks. When using replace
to switch punctuation marks, save the results back into the original variable. In this case, post_colon_text
:
>>> post_colon_text = post_colon_text.replace(' ', '_')>>> post_colon_text = post_colon_text.replace(',', '_')>>> post_colon_text'_Rewriting__and_the_Immutable_String'
This has replaced the two kinds of punctuation with the desired _
characters. We can generalize this to work with all punctuation. This leverages the for
statement, which we'll look at in Chapter 2, Statements and Syntax.
We can iterate through all punctuation characters:
>>> from string import whitespace, punctuation>>> for character in whitespace + punctuation:... post_colon_text = post_colon_text.replace(character, '_')>>> post_colon_text'_Rewriting__and_the_Immutable_String'
As each kind of punctuation character is replaced, we assign the latest and greatest version of the string to the post_colon_text
variable.
Another transformational step is changing a string to all lowercase. As with the previous examples, we'll assign the results back to the original variable. Use the lower()
method, assigning the result to the original variable:
>>> post_colon_text = post_colon_text.lower()
In many cases, there are some additional steps we might follow. We often want to remove leading and trailing _
characters. We can use strip()
for this:
>>> post_colon_text = post_colon_text.strip('_')
In some cases, we'll have multiple _
characters because we had multiple punctuation marks. The final step would be something like this to cleanup up multiple _
characters:
>>> while '__' in post_colon_text:... post_colon_text = post_colon_text.replace('__', '_')
This is yet another example of the same pattern we've been using to modify a string in place. This depends on the while
statement, which we'll look at in Chapter 2, Statements and Syntax.
We can't—technically—modify a string in place. The data structure for a string is immutable. However, we can assign a new string back to the original variable. This technique behaves the same as modifying a string in place.
When a variable's value is replaced, the previous value no longer has any references and is garbage collected. We can see this by using the id()
function to track each individual string object:
>>> id(post_colon_text) 4346207968>>> post_colon_text = post_colon_text.replace('_','-')>>> id(post_colon_text) 4346205488
Your actual id numbers may be different. What's important is that the original string object assigned to post_colon_text
had one id. The new string object assigned to post_colon_text
has a different id. It's a new string object.
When the old string has no more references, it is removed from memory automatically.
We made use of slice notation to decompose a string. A slice has two parts: [start:end]
. A slice always includes the starting index. String indices always start with zero as the first item. It never includes the ending index.
Note
The items in a slice have an index from start
to end-1
. This is sometimes called a half-open interval.
Think of a slice like this: all characters where the index, i, are in the range start ≤ i < end.
We noted briefly that we can omit the start or end indices. We can actually omit both. Here are the various options available:
title[colon_position]
: A single item, the:
we found usingtitle.index(':')
.title[:colon_position]
: A slice with the start omitted. It begins at the first position, index of zero.title[colon_position+1:]
: A slice with the end omitted. It ends at the end of the string, as if we saidlen(title)
.title[:]
: Since both start and end are omitted, this is the entire string. Actually, it's a copy of the entire string. This is the quick and easy way to duplicate a string.
There are more features to indexing in Python collections like a string. The normal indices start with 0 at the left end. We have an alternate set of indices using negative names that work from the right end of a string.
title[-1]
is the last character in the title,g
title[-2]
is the next-to-last character,n
title[-6:]
is the last six characters,String
We have a lot of ways to pick pieces and parts out of a string.
Python offers dozens of methods for modifying a string. Section 4.7 of the Python Standard Library describes the different kinds of transformations that are available to us. There are three broad categories of string methods. We can ask about a string, we can parse a string, and we can transform a string. Methods such as isnumeric()
tell us if a string is all digits.
Here's an example:
>>> 'some word'.isnumeric()False>>> '1298'.isnumeric()True
We've looked at parsing with the partition()
method. And we've looked at transforming with the lower()
method.
- We'll look at the string as list technique for modifying a string in the Building complex strings from lists of characters recipe.
- Sometimes we have data that's only a stream of bytes. In order to make sense of it, we need to convert it into characters. That's the subject for the Decoding bytes – how to get proper characters from some bytes recipe.