Penetration Testing with the Bash shell

The Global Regular Expression Print (grep) utility is a staple for all command-line jockeys. The grep utility in its most basic functionality gives its users the ability to run regular expressions on a given input file or stream and prints the matching results. More advanced features of grep allow you to specify which attributes of the matching text you'd like to print, whether you'd like the output colorized, or even how many lines around the matching output you should print. It's packed with many very useful features, and once mastered they become an essential part of any penetration tester, developer, or system administrator's arsenal.

Tip

To properly make use of grep, you will need at least basic understanding and practice with regular expressions. Regular expressions will not be covered in their entirety here, though simple examples and basic elements of regular expression language will be covered. For more extensive reading on regular expressions and how they work, see the Further reading section at the end of the chapter.

Regular expression language – a crash course

Regular expressions are merely strings that describe a collection of strings using a special language—in formal language theory terms, any collection or set of strings is termed as language. Being able to wield this language to your disposal is an invaluable skill. It will help you do many things from static code source analysis, reverse engineering, malware fingerprinting and larger vulnerability assessment, and exploit development.

The regular expression language supported by grep is filled with useful shorthands to simplify the description of a set of common strings, for instance, describing a string consisting of any decimal number, any lowercase or uppercase alphabetic character or even any printable character. So given that any string or collection of strings must be composed of a collection of smaller strings, if you know how to match or describe any alphabetic character or any decimal number, you should be able to describe anything composed of characters from those character classes. A character class is simply a language composed of length 1 strings from a specific collection of characters.

First of all, we need to define some "control" characters. Given that you will be describing strings using other strings, there needs to be a way to designate special meaning to given characters or substrings in your regular expression. Otherwise, all you'd be able to do is compare one string to another, character by character. You can do that as follows:

^: The following regular expression must be matched at the beginning of a line, for example, ^this is the start of the line.
$: The preceding regular expression must be matched at the end of a line, for example, this is the end of the line$.
[]: The description of a character class, or a list of characters, is contained within the brackets, and strings that match contain characters in the specified list. Certain character classes can be described using shorthands. We will see some of them throughout the rest of the chapter.
(): This logically groups regular expressions together.
|: This is a logical OR of two regular expressions, for instance, ([expression]) | ([expression]).
?: This matches the preceding regular expression at least once. For example, keith? will match any string that either contains "keith" or doesn't at all.
+: This matches the preceding regular expression at least once.
{n}: This matches the preceding regular expression exactly n times.
{n,m}: This matches the preceding regular expression at least n times and at most m times. For example [0-9]{0,10} will match any decimal number containing between 0 and 10 digits.

The following is a small collection of some of the shorthands grep supports as an extended regular expression language:

[:alnum:]: This matches alphanumeric characters, any decimal digit, or alphabetical character
[:alpha:]: This matches strictly alphabetical characters a-z
[:digit:]: This strictly matches decimal numbers 0-9
[:punt:]: Any punctuation character will be matched

There are a number of other character class shorthands available; see the manual page for grep for more information.

Regular expressions are simply collections of these control characters and character classes. For example, you could combine them in any way you like as long as all the brackets, braces, and parenthesis are balanced.

Now that you have some basic background in regular expressions, let's look at the grep utility's usage specification using the following command:

grep [options] PATTERN [file list]
[options] := [matcher selection][matching control][output control][file selection][other]
PATTERN := a pattern used to match with content in the file list.
[matcher selection] := [-E|--extended-regexp][-F|--fixed-strings]...
[matching control] := [-e|--regexp][-f|--file][-i|--ignore-case]... 
[output control] := [-c][--count][-L|--files-without-match]...
[file selection] := [-a | --text][--binary-files=TYPE][--exclude]...
[file list] := [file name] [file name] ... [file name]

Please remember this is a mere summary of the structure of the command and does not mention all possible options. For more information about the grep utility's regular expression syntax, please see the Further reading section at the end of this chapter, as well as the man page for Perl regular expressions, which can be reached by executing the command man 3 pcresyntax. You can also learn more about regular expression by checking out the man page on POSIX.2 regular expressions, Kali Linux might not have the man page mentioned in the previous command. You can get the regex manual page using the command man 7 regex.

Building on this specification, let's look at some of the options in detail.

Regular expression matcher selection options

Part of the invocation of grep requires you to let grep know what method you would like to use to match your pattern with the contents of the file. This is because grep is capable of more than just running regular expressions.

The following are the options for matcher selection:

-E or –-extended-regexp: This interprets the PATTERN argument as an extended regular expression
Note
Extended regular expression language is pretty much what everyone uses today, but this wasn't always the case. Way back in Unix's heyday, regular expressions were represented using something called POSIX (Portable Operating System Interface) basic regular expression language. Some years later, Unix developers added some functionality to the regular expression language and a new standard for representing this new, more shorthand-laden language was created called the Extended Regular Expression (ERE) language standard.
-F or –-fixed-strings: This tells grep to interpret PATTERN as a list of fixed strings separated by newlines to look for in the given file list
For example, the following screenshot shows the output of this command:
-P or –-perl-regexp: This allows grep to interpret PATTERN as a Perl regular expression

Regular expression matching control options

The following options allow you to control a little about how the data being matched should be treated, whether you'd like to match whole words in your input or whole lines or funnel in a number of patterns from a given file.

The following are the options for matching control:

-e PATTERN or –-regexp=PATTERN: This forces the PATTERN argument supplied here to be used as PATTERN to match against the input files.
The following command is an example of the usage for the preceding option:
```
cat /etc/passwd | grep –e '^root' 
```
The preceding example matches the line that starts with the word root.
-f or –-file=FILE: This grabs a list of patterns to use from the supplied file.
For example, consider a file containing the following text:
```
^root
^www
^nobody
```
This file can be used with the –f option as follows:
```
grep –f patterns.txt < /etc/passwd
```
-v or –-invert-match: This inverts the matching, which means select or report only file contents that don't match.
-w or –-word-regexp: This report lines from the input files that have whole matching words.
For example, see the output of the following commands:
```
root@kali:~# grep r –w < /etc/passwd

root@kali:~# grep ro –w < /etc/passwd

root@kali:~# grep root –w < /etc/passwd
root:x:0:0:root:/root:/bin/bash
```
As you can see from the previous output, and maybe some of your own testing, the first two runs did not describe a complete word of the contents of the /etc/passwd file. However, the last run does; so it's the only one that actually produces output.
-x or –-line-regexp: This reports or prints lines from the input file that have whole lines matching.

Output control options

The grep utility also allows you to control how it reports information about successful matches. You can also specify which attributes of the matches to report on.

The following are the some of the output control options:

-c or –-count: This doesn't report on the matched data, instead prints the number of matches.
-L or –-files-without-match: This prints only the names of files that contain no matches.
-l or –-files-with-matches: This prints only the names of files that contain matches.
-m or –max-count=NUM: This stops processing input after NUM number of matches. If input comes from standard input or using an input redirection, the processing will stop after NUM lines are read.
-o or –only-matching: This prints the matching parts of the input data, each on a separate line.

File selection options

The following options allow you to specify where the input files should come from and also control some of the attributes of the input data as a whole.

The following are the options for the file selection:

-a or –-text: This forces binary files to be processed as text. This allows you to operate grep much like the strings utility, which returns all the printable strings from a given file with the added benefit of being able to match the strings using regular expressions.
For example:
```
grep 'printf' –m 1 –color –text `which echo`
```
Note
The which command
The which command prints the canonical file path of the supplied argument. Here, it appears in back-ticks so that the bash shell will substitute this command for the value it produces, which effectively means grep will be running through the binary for the echo command.
The output of the previous command is as shown in the following screenshot:
--binary-files=TYPE: This checks if a file supplied as input is a binary file. If yes, then it treats the file as the specified TYPE.
-D ACTION or –-devices=ACTION: This processes the input file as a device and uses the ACTION parameter to siphon input from it. By default, ACTION is read.
--exclude=GLOB: This skips any files whose name matches GLOB; wild cards are honored in the matching.
-R, –r, or –-recursive: This processes all the reachable file entries in nested directories from the current directory.

Well that's pretty much it as far as grep goes. Hopefully, you'll be able to make use of these options to find what you're looking for. It takes a little practice and getting used to but once mastered, grep is an invaluable utility.

Penetration Testing with the Bash shell

By : Keith Harald Esrick Makan

Penetration Testing with the Bash shell

By: Keith Harald Esrick Makan

Overview of this book

Related Content you might be interested in

Current Title:

Penetration Testing with the Bash shell

Getting to know grep

Tip

Regular expression language – a crash course

Regular expression matcher selection options

Note

Regular expression matching control options

Output control options

File selection options

Note