Book Image

Modern Python Standard Library Cookbook

By : Alessandro Molina
Book Image

Modern Python Standard Library Cookbook

By: Alessandro Molina

Overview of this book

The Python 3 Standard Library is a vast array of modules that you can use for developing various kinds of applications. It contains an exhaustive list of libraries, and this book will help you choose the best one to address specific programming problems in Python. The Modern Python Standard Library Cookbook begins with recipes on containers and data structures and guides you in performing effective text management in Python. You will find Python recipes for command-line operations, networking, filesystems and directories, and concurrent execution. You will learn about Python security essentials in Python and get to grips with various development tools for debugging, benchmarking, inspection, error reporting, and tracing. The book includes recipes to help you create graphical user interfaces for your application. You will learn to work with multimedia components and perform mathematical operations on date and time. The recipes will also show you how to deploy different searching and sorting algorithms on your data. By the end of the book, you will have acquired the skills needed to write clean code in Python and develop applications that meet your needs.
Table of Contents (21 chapters)
Title Page
Copyright and Credits
Packt Upsell
Contributors
Preface
Index

Normalizing text


In many cases, a single word can be written in multiple ways. For example, users who wrote "Über" and "Uber" probably meant the same word. If you were implementing a feature like tagging for a blog, you certainly don't want to end up with two different tags for the two words.

So, before saving your tags, you might want to normalize them to plain ASCII characters so that they end up all being considered as the same tag.

How to do it...

What we need is a translation map that converts all accented characters to their plain representation:

importunicodedata,sysclassunaccented_map(dict):def__missing__(self,key):ch=self.get(key)ifchisnotNone:returnchde=unicodedata.decomposition(chr(key))ifde:try:ch=int(de.split(None,1)[0],16)except(IndexError,ValueError):ch=keyelse:ch=keyself[key]=chreturnchunaccented_map=unaccented_map()

Then we can apply it to any word to normalize it:

>>> 'Über'.translate(unaccented_map)Uber>>> 'garçon'.translate(unaccented_map)garcon

How it works...