Book Image

ModSecurity 2.5

Book Image

ModSecurity 2.5

Overview of this book

With more than 67% of web servers running Apache and web-based attacks becoming more and more prevalent, web security has become a critical area for web site managers. Most existing tools work on the TCP/IP level, failing to use the specifics of the HTTP protocol in their operation. Mod_security is a module running on Apache, which will help you overcome the security threats prevalent in the online world. A complete guide to using ModSecurity, this book will show you how to secure your web application and server, and does so by using real-world examples of attacks currently in use. It will help you learn about SQL injection, cross-site scripting attacks, cross-site request forgeries, null byte attacks, and many more so that you know how attackers operate. Using clear, step-by-step instructions this book starts by teaching you how to install and set up ModSecurity, before diving into the rule language with examples. It assumes no prior knowledge of ModSecurity, so as long as you are familiar with basic Linux administration, you can start to learn right away. Real-life case studies are used to illustrate the dangers on the Web today ñ you will for example learn how the recent worm that hit Twitter works, and how you could have used ModSecurity to stop it in its tracks. The mechanisms behind these and other attacks are described in detail, and you will learn everything you need to know to make sure your server and web application remain unscathed on the increasingly dangerous web. Have you ever wondered how attackers figure out the exact web server version running on a system? They use a technique called HTTP fingerprinting, and you will learn about this in depth and how to defend against it by flying your web server under a "false flag". The last part of the book shows you how to really lock down a web application by implementing a positive security model that only allows through requests that conform to a specific, pre-approved model, and denying anything that is even the slightest bit out of line.
Table of Contents (17 chapters)
ModSecurity 2.5
Credits
About the Author
About the Reviewers
Preface
Directives and Variables
Index

Our email address regex


At the beginning of the chapter I introduced a regular expression for extracting email addresses from web pages. As promised, let's use our newfound knowledge of regexes to see exactly how it works. Here, again, is the regular expression as it was presented in the beginning of the chapter:

\b[-\w.+]+@[\w.]+\.[a-zA-Z]{2,4}\b

We noted that an email address consists of a username, @ character, and domain name. The first part of the regex is \b, which makes sure that the email address starts at a word boundary. Following that, we see that the [-\w.+] character class allows for a word character as well as a dash, dot, or a plus sign. In this case, the dot does not need to be escaped as it is contained within a character class. Also worth noting is that the plus sign inside the character class is also interpreted as a literal plus and not as a repetition quantifier. There is another plus sign immediately following the character class, and this is an actual plus quantifier that is used to match against one or more occurrences of the characters within the character class.

Following this, the @ character is matched literally, as it is a requirement for it to be present in an email address. After this the same character class as before, [\w.]+ is used to allow an arbitrary number of sub-domains (for example, misec.net and support.misec.net are both allowed using this construct).

The second-to-last part of the regular expression is \.[a-zA-Z]{2,4}, and this corresponds to the top-level domain in the email address (such as .com). We see how the dot is required (and is escaped, so that it only matches the dot and not any character). Following this, a letter is required from two up to four times—this allows it to match top-level domains such as de and com and also four-letter domains such as info. Finally, the last part of the regex is another \b word-boundary assertion, to make sure the email address precedes a space or similar word-boundary marker.