When working with web-based software, it's frequently necessary to understand links, protocols, and paths.
You might be tempted to rely on regular expressions or strings splitting to parse URLs, but if you account for all the oddities a URL might include (things such as credentials or particular protocols), it might not be as easy as you expect.
Python provides utilities in the urllib
and cgi
modules that make life easier when you want to account for all the possible different formats a URL can have.
Relying on them can make life easier and your software more robust.
The urllib.parse
module has multiple tools to parse URLs. The most commonly used solution is to rely on urllib.parse.urlparse
, which can handle the most widespread kinds of URLs:
import urllib.parse def parse_url(url): """Parses an URL of the most widespread format. This takes for granted there is a single set of parameters for the whole path. """ parts = urllib.parse.urlparse(url...