We will now move on to adapting our Web crawler to Celery. We already have webcrawler_queue
, which is responsible for encapsulating web-type hcrawler
tasks. However, in the server side, we will create our crawl_task
task inside the tasks.py
module.
First, we will add our imports to the re
and requests
modules, which are the modules for regular expression and the HTTP library respectively. The code is as follows:
import re import requests
Then, we will define our regular expression, which we studied in the previous chapters, as follows:
hTML_link_regex = re.compile( '<a\s(?:.*?\s)*?href=[\'"](.*?)[\'"].*?>')
Now, we will place our crawl_task
function in the Web crawler, add the @app.task
decorator, and change the return message a bit, as follows:
@app.task def crawl_task(url): request_data = requests.get(url) links = html_link_regex.findall(request_data.text) message = "The task %s found the following links %s.."\ Return message...