Book Image

Modern Python Cookbook - Second Edition

By : Steven F. Lott
Book Image

Modern Python Cookbook - Second Edition

By: Steven F. Lott

Overview of this book

Python is the preferred choice of developers, engineers, data scientists, and hobbyists everywhere. It is a great language that can power your applications and provide great speed, safety, and scalability. It can be used for simple scripting or sophisticated web applications. By exposing Python as a series of simple recipes, this book gives you insight into specific language features in a particular context. Having a tangible context helps make the language or a given standard library feature easier to understand. This book comes with 133 recipes on the latest version of Python 3.8. The recipes will benefit everyone, from beginners just starting out with Python to experts. You'll not only learn Python programming concepts but also how to build complex applications. The recipes will touch upon all necessary Python concepts related to data structures, object oriented programming, functional programming, and statistical programming. You will get acquainted with the nuances of Python syntax and how to effectively take advantage of it. By the end of this Python book, you will be equipped with knowledge of testing, web services, configuration, and application integration tips and tricks. You will be armed with the knowledge of how to create applications with flexible logging, powerful configuration, command-line options, automated unit tests, and good documentation.
Table of Contents (18 chapters)
16
Other Books You May Enjoy
17
Index

Rewriting an immutable string

How can we rewrite an immutable string? We can't change individual characters inside a string:

>>> title = "Recipe 5: Rewriting, and the Immutable String"
>>> title[8] = ''
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'str' object does not support item assignment

Since this doesn't work, how do we make a change to a string?

Getting ready

Let's assume we have a string like this:

>>> title = "Recipe 5: Rewriting, and the Immutable String"

We'd like to do two transformations:

  • Remove the part up to the :
  • Replace the punctuation with _, and make all the characters lowercase

Since we can't replace characters in a string object, we have to work out some alternatives. There are several common things we can do, shown as follows:

  • A combination of slicing and concatenating a string to create a new string.
  • When shortening, we often use the partition() method.
  • We can replace a character or a substring with the replace() method.
  • We can expand the string into a list of characters, then join the string back into a single string again. This is the subject of a separate recipe, Building complex strings with a list of characters.

How to do it...

Since we can't update a string in place, we have to replace the string variable's object with each modified result. We'll use an assignment statement that looks something like this:

some_string = some_string.method()

Or we could even use an assignment like this:

some_string = some_string[:chop_here]

We'll look at a few specific variations of this general theme. We'll slice a piece of a string, we'll replace individual characters within a string, and we'll apply blanket transformations such as making the string lowercase. We'll also look at ways to remove extra _ that show up in our final string.

Slicing a piece of a string

Here's how we can shorten a string via slicing:

  1. Find the boundary:
    >>> colon_position = title.index(':')
    

    The index function locates a particular substring and returns the position where that substring can be found. If the substring doesn't exist, it raises an exception. The following expression will always be true: title[colon_position] == ':'.

  2. Pick the substring:
    >>> discard, post_colon = title[:colon_position], title[colon_position+1:]
    >>> discard
    'Recipe 5'
    >>> post_colon
    ' Rewriting, and the Immutable String'
    

We've used the slicing notation to show the start:end of the characters to pick. We also used multiple assignment to assign two variables, discard and post_colon, from the two expressions.

We can use partition(), as well as manual slicing. Find the boundary and partition:

>>> pre_colon_text, _, post_colon_text = title.partition(':')
>>> pre_colon_text
'Recipe 5'
>>> post_colon_text
' Rewriting, and the Immutable String'

The partition function returns three things: the part before the target, the target, and the part after the target. We used multiple assignment to assign each object to a different variable. We assigned the target to a variable named _ because we're going to ignore that part of the result. This is a common idiom for places where we must provide a variable, but we don't care about using the object.

Updating a string with a replacement

We can use a string's replace() method to create a new string with punctuation marks removed. When using replace to switch punctuation marks, save the results back into the original variable. In this case, post_colon_text:

>>> post_colon_text = post_colon_text.replace(' ', '_')
>>> post_colon_text = post_colon_text.replace(',', '_')
>>> post_colon_text
'_Rewriting__and_the_Immutable_String'

This has replaced the two kinds of punctuation with the desired _ characters. We can generalize this to work with all punctuation. This leverages the for statement, which we'll look at in Chapter 2, Statements and Syntax.

We can iterate through all punctuation characters:

>>> from string import whitespace, punctuation
>>> for character in whitespace + punctuation:
...     post_colon_text = post_colon_text.replace(character, '_')
>>> post_colon_text
'_Rewriting__and_the_Immutable_String'

As each kind of punctuation character is replaced, we assign the latest and greatest version of the string to the post_colon_text variable.

We can also use a string's translate() method for this. This relies on creating a dictionary object to map each source character's position to a resulting character:

>>> from string import whitespace, punctuation
>>> title = "Recipe 5: Rewriting an Immutable String"
>>> title.translate({ord(c): '_' for c in whitespace+punctuation})
Recipe_5__Rewriting_an_Immutable_String

We've created a mapping with {ord(c): '_' for c in whitespace+punctuation} to translate any character, c, in the whitespace+punctuation sequence of characters to the '_' character. This may have better performance than a sequence of individual character replacements.

Removing extra punctuation marks

In many cases, there are some additional steps we might follow. We often want to remove leading and trailing _ characters. We can use strip() for this:

>>> post_colon_text = post_colon_text.strip('_')

In some cases, we'll have multiple _ characters because we had multiple punctuation marks. The final step would be something like this to clean up multiple _ characters:

>>> while '__' in post_colon_text:
...    post_colon_text = post_colon_text.replace('__', '_')

This is yet another example of the same pattern we've been using to modify a string in place. This depends on the while statement, which we'll look at in Chapter 2, Statements and Syntax.

How it works...

We can't—technically—modify a string in place. The data structure for a string is immutable. However, we can assign a new string back to the original variable. This technique behaves the same as modifying a string in place.

When a variable's value is replaced, the previous value no longer has any references and is garbage collected. We can see this by using the id() function to track each individual string object:

>>> id(post_colon_text)
4346207968
>>> post_colon_text = post_colon_text.replace('_','-')
>>> id(post_colon_text)
4346205488

Your actual ID numbers may be different. What's important is that the original string object assigned to post_colon_text had one ID. The new string object assigned to post_colon_text has a different ID. It's a new string object.

When the old string has no more references, it is removed from memory automatically.

We made use of slice notation to decompose a string. A slice has two parts: [start:end]. A slice always includes the starting index. String indices always start with zero as the first item. A slice never includes the ending index.

The items in a slice have an index from start to end-1. This is sometimes called a half-open interval.

Think of a slice like this: all characters where the index i is in the range start ≤ i < end.

We noted briefly that we can omit the start or end indices. We can actually omit both. Here are the various options available:

  • title[colon_position]: A single item, that is, the : we found using title.index(':').
  • title[:colon_position]: A slice with the start omitted. It begins at the first position, index of zero.
  • title[colon_position+1:]: A slice with the end omitted. It ends at the end of the string, as if we said len(title).
  • title[:]: Since both start and end are omitted, this is the entire string. Actually, it's a copy of the entire string. This is the quick and easy way to duplicate a string.

There's more...

There are more features for indexing in Python collections like a string. The normal indices start with 0 on the left. We have an alternate set of indices that use negative numbers that work from the right end of a string:

  • title[-1] is the last character in the title, 'g'
  • title[-2] is the next-to-last character, 'n'
  • title[-6:] is the last six characters, 'String'

We have a lot of ways to pick pieces and parts out of a string.

Python offers dozens of methods for modifying a string. The Text Sequence Type — str section of the Python Standard Library describes the different kinds of transformations that are available to us. There are three broad categories of string methods: we can ask about the string, we can parse the string, and we can transform the string to create a new one. Methods such as isnumeric() tell us if a string is all digits.

Here's an example:

>>> 'some word'.isnumeric()
False
>>> '1298'.isnumeric()
True

Before doing comparisons, it can help to change a string so that it has the same uniform case. It's frequently helpful to use the lower() method, thus assigning the result to the original variable:

>>> post_colon_text = post_colon_text.lower()

We've looked at parsing with the partition() method. We've also looked at transforming with the lower() method, as well as the replace() and translate() methods.

See also

  • We'll look at the string as list technique for modifying a string in the Building complex strings from lists of characters recipe.
  • Sometimes, we have data that's only a stream of bytes. In order to make sense of it, we need to convert it into characters. That's the subject of the Decoding bytes – how to get proper characters from some bytes recipe.