Book Image

Bash Cookbook

By : Ron Brash, Ganesh Sanjiv Naik
Book Image

Bash Cookbook

By: Ron Brash, Ganesh Sanjiv Naik

Overview of this book

In Linux, one of the most commonly used and most powerful tools is the Bash shell. With its collection of engaging recipes, Bash Cookbook takes you through a series of exercises designed to teach you how to effectively use the Bash shell in order to create and execute your own scripts. The book starts by introducing you to the basics of using the Bash shell, also teaching you the fundamentals of generating any input from a command. With the help of a number of exercises, you will get to grips with the automation of daily tasks for sysadmins and power users. Once you have a hands-on understanding of the subject, you will move on to exploring more advanced projects that can solve real-world problems comprehensively on a Linux system. In addition to this, you will discover projects such as creating an application with a menu, beginning scripts on startup, parsing and displaying human-readable information, and executing remote commands with authentication using self-generated Secure Shell (SSH) keys. By the end of this book, you will have gained significant experience of solving real-world problems, from automating routine tasks to managing your systems and creating your own scripts.
Table of Contents (15 chapters)
Title Page
Copyright and Credits
Packt Upsell
Contributors
Preface
Index

Scraping the web and collecting files


In this recipe, we will learn how to collect data by web scraping. We will write a script for that.

Getting ready

Besides having a Terminal open, you need to have basic knowledge of the grep and wget commands.

How to do it…

Now, we will write a script to scrape the contents from imdb.com. We will use the grep and wget commands in the script to get the contents. Create a scrap_contents.shscript and write the following code in it:

$ mkdir -p data
$ cd data
$ wget -q -r -l5 -x 5  https://imdb.com
$ cd ..
$ grep -r -Po -h '(?<=href=")[^"]*' data/ > links.csv
$ grep "^http" links.csv > links_filtered.csv
$ sort -u links_filtered.csv > links_final.csv
$ rm -rf data links.csv links_filtered.csv

How it works…

In the preceding script, we have written code to get contents from a website. The wget utility is used for retrieving files from the web using the http, https, and ftp protocols. In this example, we are getting data from imdb.com and therefore we specified...