Book Image

Automate it! - Recipes to upskill your business

By : Chetan Giridhar
Book Image

Automate it! - Recipes to upskill your business

By: Chetan Giridhar

Overview of this book

<p>This book gives you a great selection of recipes to automate your business processes with Python, and provides a platform for you to understand how Python is useful to make time consuming and repetitive business tasks more efficient. Python is a mature high level language, has object-oriented programming features, powers various apps, has a huge set of modules, and great community support. Python is extremely easy to use, can help you get complex tasks done efficiently and is an apt choice for our needs.</p> <p>With a classic problem-solution based approach and real-world examples, you will delve into things that automate your business processes. You will begin by learning about the Python modules to work with Web, Worksheets, Presentations and PDFs. You’ll leverage Python recipes to automate processes in HR, Finance and making them efficient and reliable. For instance, company payroll — an integral process in HR will be automated with Python recipes.</p> <p>A few chapters of this book will also help you gain knowledge on working with bots and computer vision. You will learn how to build bots for automating business use cases by integrating artificial intelligence. You’ll also understand how Python is helpful in face detection and building a scanner of your own. You will see how to effectively and easily use Python code to manage SMS and voice notifications, opening a world of possibilities using cloud telephony to solve your business needs. Moving forward, you will learn to work with APIs, Webhooks and Emails to automate Marketing and Customer Support processes. Finally, using the various Python libraries, this book will arm you with knowledge to customize data solutions and generate reports to meet your business needs.</p> <p>This book will help you up-skill and make your business processes efficient with the various Python recipes covered in this book.</p>
Table of Contents (18 chapters)
Automate it!
Credits
About the Author
About the Reviewer
www.PacktPub.com
Customer Feedback
Preface

Downloading content from the Web


So, in the earlier recipe, we saw how to make HTTP requests, and you also learnt how to parse a web response. It's time to move ahead and download content from the Web. You know that the WWW is not just about HTML pages. It contains other resources, such as text files, documents, and images, among many other formats. Here, in this recipe, you'll learn ways to download images in Python with an example.

Getting ready

To download images, we will need two Python modules, namely BeautifulSoup and urllib2. We could use the requests module instead of urrlib2, but this will help you learn about urllib2 as an alternative that can be used for HTTP requests, so you can boast about it.

How to do it...

  1. Before starting this recipe, we need to answer two questions. What kind of images would we like to download? From which location on the Web do I download the images? In this recipe, we download Avatar movie images from Google (https://google.com) images search. We download the top five images that match the search criteria. For doing this, let's import the Python modules and define the variables we'll need:

            from bs4 import BeautifulSoup
            import re
            import urllib2
            import os 
            ## Download paramters
            image_type = "Project"
            movie = "Avatar"
            url = "https://www.google.com/search?q="+movie+"&source=lnms&tbm=isch"

  2. OK then, let's now create a BeautifulSoup object with URL parameters and appropriate headers. See the use of User-Agent while making HTTP calls with Python's urllib module. The requests module uses its own User-Agent while making HTTP calls:

            header = {'User-Agent': 'Mozilla/5.0'}
            soup = BeautifulSoup(urllib2.urlopen
            (urllib2.Request(url,headers=header)))

  3. Google images are hosted as static content under the domain name http://www.gstatic.com/. So, using the BeautifulSoup object, we now try to find all the images whose source URL contains http://www.gstatic.com/. The following code does exactly the same thing:

            images = [a['src'] for a in soup.find_all("img", {"src":
            re.compile("gstatic.com")})][:5]
            for img in images:
            print "Image Source:", img

    The output of the preceding code snippet can be seen in the following screenshot. Note how we get the image source URL on the Web for the top five images:

  4. Now that we have the source URL of all the images, let's download them. The following Python code uses the urlopen() method to read() the image and downloads it onto the local file system:

            for img in images:
              raw_img = urllib2.urlopen(img).read()
              cntr = len([i for i in os.listdir(".") if image_type in i]) + 1
            f = open(image_type + "_"+ str(cntr)+".jpg", 'wb') 
            f.write(raw_img)
            f.close()

  5. When the images get downloaded, we can see them on our editor. The following snapshot shows the top five images we downloaded and Project_3.jpg looks as follows:

How it works...

So, in this recipe, we looked at downloading content from the Web. First, we defined the parameters for download. Parameters are like configurations that define the location where the downloadable resource is available and what kind of content is to be downloaded. In our example, we defined that we have to download Avatar movie images and, that too, from Google.

Then we created the BeautifulSoup object, which will make the URL request using the urllib2 module. Actually, urllib2.Request() prepares the request with the configuration, such as headers and the URL itself, and urllib2.urlopen() actually makes the request. We wrapped the HTML response of the urlopen() method and created a BeautifulSoup object so that we could parse the HTML response.

Next, we used the soup object to search for the top five images present in the HTML response. We searched for images based on the img tag with the find_all() method. As we know, find_all() returns a list of image URLs where the picture is available on Google.

Finally, we iterated through all the URLs and again used the urlopen() method on URLs to read() the images. Read() returns the image in a raw format as binary data. We then used this raw image to write to a file on our local file system. We also added a logic to name the image (they actually auto-increment) so that they're uniquely identified in the local file system.

That's nice! Exactly what we wanted to achieve! Now let's up the ante a bit and see what else we can explore in the next recipe.