Sign In Start Free Trial
Account

Add to playlist

Create a Playlist

Modal Close icon
You need to login to use this feature.
  • Book Overview & Buying PHP 7 Programming Cookbook
  • Table Of Contents Toc
  • Feedback & Rating feedback
PHP 7 Programming Cookbook

PHP 7 Programming Cookbook

By : Bierer
4.1 (8)
close
close
PHP 7 Programming Cookbook

PHP 7 Programming Cookbook

4.1 (8)
By: Bierer

Overview of this book

PHP 7 comes with a myriad of new features and great tools to optimize your code and make your code perform faster than in previous versions. Most importantly, it allows you to maintain high traffic on your websites with low-cost hardware and servers through a multithreading web server. This book demonstrates intermediate to advanced PHP techniques with a focus on PHP 7. Each recipe is designed to solve practical, real-world problems faced by PHP developers like yourself every day. We also cover new ways of writing PHP code made possible only in version 7. In addition, we discuss backward-compatibility breaks and give you plenty of guidance on when and where PHP 5 code needs to be changed to produce the correct results when running under PHP 7. This book also incorporates the latest PHP 7.x features. By the end of the book, you will be equipped with the tools and skills required to deliver efficient applications for your websites and enterprises.
Table of Contents (16 chapters)
close
close
15
Index

Building a deep web scanner

Sometimes you need to scan a website, but go one level deeper. For example, you want to build a web tree diagram of a website. This can be accomplished by looking for all <A> tags and following the HREF attributes to the next web page. Once you have acquired the child pages, you can then continue scanning in order to complete the tree.

How to do it...

  1. A core component of a deep web scanner is a basic Hoover class, as described previously. The basic procedure presented in this recipe is to scan the target website and hoover up all the HREF attributes. For this purpose, we define a Application\Web\Deep class. We add a property that represents the DNS domain:
    namespace Application\Web;
    class Deep
    {
        protected $domain;
  2. Next, we define a method that will hoover the tags for each website represented in the scan list. In order to prevent the scanner from trawling the entire World Wide Web (WWW), we've limited the scan to the target domain. The reason why yield from has been added is because we need to yield the entire array produced by Hoover::getTags(). The yield from syntax allows us to treat the array as a sub-generator:
    public function scan($url, $tag)
    {
        $vac    = new Hoover();
        $scan   = $vac->getAttribute($url, 'href', 
           $this->getDomain($url));
        $result = array();
        foreach ($scan as $subSite) {
            yield from $vac->getTags($subSite, $tag);
        }
        return count($scan);
    }

    Note

    The use of yield from turns the scan() method into a PHP 7 delegating generator. Normally, you would be inclined to store the results of the scan into an array. The problem, in this case, is that the amount of information retrieved could potentially be massive. Thus, it's better to immediately yield the results in order to conserve memory and to produce immediate results. Otherwise, there would be a lengthy wait, which would probably be followed by an out of memory error.

  3. In order to keep within the same domain, we need a method that will return the domain from the URL. We use the convenient parse_url() function for this purpose:
    public function getDomain($url)
    {
        if (!$this->domain) {
            $this->domain = parse_url($url, PHP_URL_HOST);
        }
        return $this->domain;
    }

How it works...

First of all, go ahead and define the Application\Web\Deep class defined previously, as well as the Application\Web\Hoover class defined in the previous recipe.

Next, define a block of code from chap_01_deep_scan_website.php that sets up autoloading (as described earlier in this chapter):

<?php
// modify as needed
define('DEFAULT_URL', unlikelysource.com');
define('DEFAULT_TAG', 'img');

require __DIR__ . '/../../Application/Autoload/Loader.php';
Application\Autoload\Loader::init(__DIR__ . '/../..');

Next, get an instance of our new class:

$deep = new Application\Web\Deep();

At this point, you can retrieve URL and tag information from URL parameters. The PHP 7 null coalesce operator is useful for establishing fallback values:

$url = strip_tags($_GET['url'] ?? DEFAULT_URL);
$tag = strip_tags($_GET['tag'] ?? DEFAULT_TAG);

Some simple HTML will display results:

foreach ($deep->scan($url, $tag) as $item) {
    $src = $item['attributes']['src'] ?? NULL;
    if ($src && (stripos($src, 'png') || stripos($src, 'jpg'))) {
        printf('<br><img src="%s"/>', $src);
    }
}

See also

For more information on generators and yield from, please see the article at http://php.net/manual/en/language.generators.syntax.php.

Visually different images
CONTINUE READING
83
Tech Concepts
36
Programming languages
73
Tech Tools
Icon Unlimited access to the largest independent learning library in tech of over 8,000 expert-authored tech books and videos.
Icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Icon 50+ new titles added per month and exclusive early access to books as they are being written.
PHP 7 Programming Cookbook
notes
bookmark Notes and Bookmarks search Search in title playlist Add to playlist download Download options font-size Font size

Change the font size

margin-width Margin width

Change margin width

day-mode Day/Sepia/Night Modes

Change background colour

Close icon Search
Country selected

Close icon Your notes and bookmarks

Confirmation

Modal Close icon
claim successful

Buy this book with your credits?

Modal Close icon
Are you sure you want to buy this book with one of your credits?
Close
YES, BUY

Submit Your Feedback

Modal Close icon
Modal Close icon
Modal Close icon