Book Image

UI Testing with Puppeteer

By : Dario Kondratiuk
Book Image

UI Testing with Puppeteer

By: Dario Kondratiuk

Overview of this book

Puppeteer is an open source web automation library created by Google to perform tasks such as end-to-end testing, performance monitoring, and task automation with ease. Using real-world use cases, this book will take you on a pragmatic journey, helping you to learn Puppeteer and implement best practices to take your automation code to the next level! Starting with an introduction to headless browsers, this book will take you through the foundations of browser automation, showing you how far you can get using Puppeteer to automate Google Chrome and Mozilla Firefox. You’ll then learn the basics of end-to-end testing and understand how to create reliable tests. You’ll also get to grips with finding elements using CSS selectors and XPath expressions. As you progress through the chapters, the focus shifts to more advanced browser automation topics such as executing JavaScript code inside the browser. You’ll learn various use cases of Puppeteer, such as mobile devices or network speed testing, gauging your site’s performance, and using Puppeteer as a web scraping tool. By the end of this UI testing book, you’ll have learned how to make the most of Puppeteer’s API and be able to apply it in your real-world projects.
Table of Contents (12 chapters)

Creating scrapers

Let's try to scrape book prices from the Packt site. The terms and conditions said nothing about scrapers (https://www.hardkoded.com/ui-testing-with-puppeteer/packtpub-terms). But the robots.txt file has some clear rules:

User-agent: *
Disallow: /index.php/
Disallow: /*?
Disallow: /checkout/
Disallow: /app/
Disallow: /lib/
Disallow: /*.php$
Disallow: /pkginfo/
Disallow: /report/
Disallow: /var/
Disallow: /catalog/
Disallow: /customer/
Disallow: /sendfriend/
Disallow: /review/
Disallow: /*SID=

They don't want us to go to those pages. But the site has a pretty massive sitemap.xml, with over 9,000 lines. If robots.txt is the "don't go here" sign for scrapers, sitemap.xml is the "please, check this out" sign. These are the first items on the sitemap.xml file:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
    xmlns...