We know that search engines send out autonomous programs called bots to find information on the Internet. Usually, this leads to the creation of giant indices similar to a phonebook or a dictionary. The current situation (September 2015) for Python 3 users is not ideal when it comes to scraping the Web. Most frameworks only support Python 2. However, Guido van Rossum, Benevolent Dictator for Life (BDFL) has just contributed a crawler on GitHub that uses the AsyncIO API. All hail the BDFL!
I forked the repository and made small changes in order to save crawled URLs. I also made the crawler exit early. These changes are not very elegant, but this was all I could do in a limited time frame. Anyway, I can't hope to do better than the BDFL himself.
Once we have a list of web links, we will load these webpages from Selenium (refer to the Simulating web browsing recipe). I chose PhantomJS, a headless browser, which should have a lighter footprint than Firefox. Although this is not...