The Global Regular Expression Print (grep) utility is a staple for all command-line jockeys. The grep
utility in its most basic functionality gives its users the ability to run regular expressions on a given input file or stream and prints the matching results. More advanced features of grep
allow you to specify which attributes of the matching text you'd like to print, whether you'd like the output colorized, or even how many lines around the matching output you should print. It's packed with many very useful features, and once mastered they become an essential part of any penetration tester, developer, or system administrator's arsenal.
Tip
To properly make use of grep
, you will need at least basic understanding and practice with regular expressions. Regular expressions will not be covered in their entirety here, though simple examples and basic elements of regular expression language will be covered. For more extensive reading on regular expressions and how they work, see the Further reading section at the end of the chapter.
Regular expressions are merely strings that describe a collection of strings using a special language—in formal language theory terms, any collection or set of strings is termed as language. Being able to wield this language to your disposal is an invaluable skill. It will help you do many things from static code source analysis, reverse engineering, malware fingerprinting and larger vulnerability assessment, and exploit development.
The regular expression language supported by grep
is filled with useful shorthands to simplify the description of a set of common strings, for instance, describing a string consisting of any decimal number, any lowercase or uppercase alphabetic character or even any printable character. So given that any string or collection of strings must be composed of a collection of smaller strings, if you know how to match or describe any alphabetic character or any decimal number, you should be able to describe anything composed of characters from those character classes. A character class is simply a language composed of length 1 strings from a specific collection of characters.
First of all, we need to define some "control" characters. Given that you will be describing strings using other strings, there needs to be a way to designate special meaning to given characters or substrings in your regular expression. Otherwise, all you'd be able to do is compare one string to another, character by character. You can do that as follows:
^
: The following regular expression must be matched at the beginning of a line, for example,^this is the start of the line
.$
: The preceding regular expression must be matched at the end of a line, for example,this is the end of the line$
.[]
: The description of a character class, or a list of characters, is contained within the brackets, and strings that match contain characters in the specified list. Certain character classes can be described using shorthands. We will see some of them throughout the rest of the chapter.|: This is a logical OR of two regular expressions, for instance,
([expression]) | ([expression])
.?
: This matches the preceding regular expression at least once. For example,keith?
will match any string that either contains "keith" or doesn't at all.+
: This matches the preceding regular expression at least once.{n}
: This matches the preceding regular expression exactly n times.{n,m}
: This matches the preceding regular expression at least n times and at mostm
times. For example[0-9]{0,10}
will match any decimal number containing between 0 and 10 digits.
The following is a small collection of some of the shorthands grep
supports as an extended regular expression language:
There are a number of other character class shorthands available; see the manual page for grep
for more information.
Regular expressions are simply collections of these control characters and character classes. For example, you could combine them in any way you like as long as all the brackets, braces, and parenthesis are balanced.
Now that you have some basic background in regular expressions, let's look at the grep
utility's usage specification using the following command:
grep [options] PATTERN [file list] [options] := [matcher selection][matching control][output control][file selection][other] PATTERN := a pattern used to match with content in the file list. [matcher selection] := [-E|--extended-regexp][-F|--fixed-strings]... [matching control] := [-e|--regexp][-f|--file][-i|--ignore-case]... [output control] := [-c][--count][-L|--files-without-match]... [file selection] := [-a | --text][--binary-files=TYPE][--exclude]... [file list] := [file name] [file name] ... [file name]
Please remember this is a mere summary of the structure of the command and does not mention all possible options. For more information about the grep
utility's regular expression syntax, please see the Further reading section at the end of this chapter, as well as the man page for Perl regular expressions, which can be reached by executing the command man 3 pcresyntax
. You can also learn more about regular expression by checking out the man page on POSIX.2 regular expressions, Kali Linux might not have the man page mentioned in the previous command. You can get the regex manual page using the command man 7 regex
.
Building on this specification, let's look at some of the options in detail.
Part of the invocation of grep
requires you to let grep
know what method you would like to use to match your pattern with the contents of the file. This is because grep
is capable of more than just running regular expressions.
The following are the options for matcher selection:
-E
or–-extended-regexp
: This interprets thePATTERN
argument as an extended regular expressionNote
Extended regular expression language is pretty much what everyone uses today, but this wasn't always the case. Way back in Unix's heyday, regular expressions were represented using something called POSIX (Portable Operating System Interface) basic regular expression language. Some years later, Unix developers added some functionality to the regular expression language and a new standard for representing this new, more shorthand-laden language was created called the Extended Regular Expression (ERE) language standard.
-F or –-fixed-strings
: This tellsgrep
to interpretPATTERN
as a list of fixed strings separated by newlines to look for in the given file listFor example, the following screenshot shows the output of this command:
-
P or –-perl-regexp
: This allowsgrep
to interpretPATTERN
as a Perl regular expression
The following options allow you to control a little about how the data being matched should be treated, whether you'd like to match whole words in your input or whole lines or funnel in a number of patterns from a given file.
The following are the options for matching control:
-e PATTERN
or–-regexp=PATTERN
: This forces thePATTERN
argument supplied here to be used asPATTERN
to match against the input files.The following command is an example of the usage for the preceding option:
cat /etc/passwd | grep –e '^root'
The preceding example matches the line that starts with the word
root
.-f
or–-file=FILE
: This grabs a list of patterns to use from the supplied file.For example, consider a file containing the following text:
^root ^www ^nobody
This file can be used with the
–f
option as follows:grep –f patterns.txt < /etc/passwd
-v
or–-invert-match
: This inverts the matching, which means select or report only file contents that don't match.-w
or–-word-regexp
: This report lines from the input files that have whole matching words.For example, see the output of the following commands:
root@kali:~# grep r –w < /etc/passwd root@kali:~# grep ro –w < /etc/passwd root@kali:~# grep root –w < /etc/passwd root:x:0:0:root:/root:/bin/bash
As you can see from the previous output, and maybe some of your own testing, the first two runs did not describe a complete word of the contents of the
/etc/passwd
file. However, the last run does; so it's the only one that actually produces output.-x
or–-line-regexp
: This reports or prints lines from the input file that have whole lines matching.
The grep
utility also allows you to control how it reports information about successful matches. You can also specify which attributes of the matches to report on.
The following are the some of the output control options:
-c
or–-count
: This doesn't report on the matched data, instead prints the number of matches.-L
or–-files-without-match
: This prints only the names of files that contain no matches.-l
or–-files-with-matches
: This prints only the names of files that contain matches.-m
or–max-count=NUM
: This stops processing input afterNUM
number of matches. If input comes from standard input or using an input redirection, the processing will stop afterNUM
lines are read.-o
or–only-matching
: This prints the matching parts of the input data, each on a separate line.
The following options allow you to specify where the input files should come from and also control some of the attributes of the input data as a whole.
The following are the options for the file selection:
-a
or–-text
: This forces binary files to be processed as text. This allows you to operategrep
much like the strings utility, which returns all the printable strings from a given file with the added benefit of being able to match the strings using regular expressions.For example:
grep 'printf' –m 1 –color –text `which echo`
Note
The which command
The
which
command prints the canonical file path of the supplied argument. Here, it appears in back-ticks so that the bash shell will substitute this command for the value it produces, which effectively meansgrep
will be running through the binary for theecho
command.The output of the previous command is as shown in the following screenshot:
--binary-files=TYPE
: This checks if a file supplied as input is a binary file. If yes, then it treats the file as the specifiedTYPE
.-D ACTION
or–-devices=ACTION
: This processes the input file as a device and uses theACTION
parameter to siphon input from it. By default,ACTION
is read.--exclude=GLOB
: This skips any files whose name matches GLOB; wild cards are honored in the matching.-R
,–r
, or–-recursive
: This processes all the reachable file entries in nested directories from the current directory.
Well that's pretty much it as far as grep
goes. Hopefully, you'll be able to make use of these options to find what you're looking for. It takes a little practice and getting used to but once mastered, grep
is an invaluable utility.