In many cases, when working with text, we might have to recognize text that is similar to other text, even when the two are not equal. This is a very common case in record linkage, finding duplicate entries, or for typing errors correction.
Finding similarity across text is not a straightforward task. If you try to go your own way, you will quickly realize that it gets complex and slow pretty soon.
The Python library provides tools to detect differences between two sequences in the difflib
module. Since text itself is a sequence (a sequence of characters), we can apply the provided functions to detect similarities in strings.
Perform the following steps for this recipe:
- Given a string, we want to compare:
>>> s = 'Today the weather is nice'
- Furthermore, we want to compare a set of strings to the first string:
>>> s2 = 'Today the weater is nice'
>>> s3 = 'Yesterday the weather was nice'
>>> s4 = 'Today my dog ate steak'
- We can use...