Book Image

D Cookbook

By : Adam Ruppe
Book Image

D Cookbook

By: Adam Ruppe

Overview of this book

Table of Contents (21 chapters)
D Cookbook
Credits
Foreword
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Parsing and modifying an HTML page with dom.d


My dom.d module is an HTML and XML parser that can understand much of the tag soup found on the Web. Once it parses a document, it provides a JavaScript-style DOM API for easy inspection and manipulation of the document tree.

Here, we'll use the library to extract some meta-information and text from an HTML page, and then modify it and save a local copy to explore its features and implementation, which uses several of the techniques we've learned in this book.

Getting ready

Download dom.d and characterencodings.d from my Github repository. It has no other dependencies, so you do not need to download any additional files or libraries.

How to do it…

Let's execute the following steps to parse and modify an HTML page:

  1. Import arsd.dom.

  2. Create an instance of the Document class.

  3. Pass an unvalidated HTML string to the parseGarbage method, or if you want strict checks on case and well-formedness, use parseStrict. It will throw exceptions when it encounters bad...