Book Image

Object-Oriented JavaScript

By : Stoyan Stefanov, Stoyan STEFANOV
Book Image

Object-Oriented JavaScript

By: Stoyan Stefanov, Stoyan STEFANOV

Overview of this book

Table of Contents (18 chapters)
Object-Oriented JavaScript
Credits
About the Author
About the Reviewers
Preface
Built-in Functions
Regular Expressions
Index

Appendix D. Regular Expressions

When you use regular expressions (discussed in Chapter 4), you can match literal strings, for example:

>>> "some text".match(/me/)

["me"]

But the true power of regular expressions comes from matching patterns, not literal strings. The following table describes the different syntax you can use in your patterns, and provides some examples of their use.

Pattern

Description

[abc]

Matches a class of characters.

>>> "some text".match(/[otx]/g)

["o", "t", "x", "t"]

[a-z]

A class of characters defined as a range. For example [a-d] is the same as [abcd], [a-z] matches all lowercase characters, [a-zA-Z0-9_] matches all characters, numbers and the underscore character.

>>> "Some Text".match(/[a-z]/g)

["o", "m", "e", "e", "x", "t"]

>>> "Some Text".match(/[a-zA-Z]/g)

["S", "o", "m", "e", "T", "e", "x", "t"]

[^abc]

Matches everything that is not matched by the class of characters.

>>> "Some Text".match(/[^a-z]/g)

["S", " ", "T"]

a|b

Matches a or b. The pipe character means OR, and it can be used more than once.

>>> "Some Text".match(/t|T/g);

["T", "t"]

>>> "Some Text".match(/t|T|Some/g);

["Some", "T", "t"]

a(?=b)

Matches a only if followed by b.

>>> "Some Text".match(/Some(?=Tex)/g);

null

>>> "Some Text".match(/Some(?= Tex)/g);

["Some"]

a(?!b)

Matches a only when not followed by b.

>>> "Some Text".match(/Some(?! Tex)/g);

null

>>> "Some Text".match(/Some(?!Tex)/g);

["Some"]

\

Escape character used to help you match the special characters used in patterns as literals.

>>> "R2-D2".match(/[2-3]/g)

["2", "2"]

>>> "R2-D2".match(/[2\-3]/g)

["2", "-", "2"]

\n

\r

\f

\t

\v

New line

Carriage return

Form feed

Tab

Vertical tab

\s

White space, or any of the five escape sequences above.

>>> "R2\n D2".match(/\s/g)

["\n", " "]

\S

Opposite of the above; matches everything but white space. Same as [^\s]:

>>> "R2\n D2".match(/\S/g)

["R", "2", "D", "2"]

\w

Any letter, number, or underscore. Same as [A-Za-z0-9_].

>>> "Some text!".match(/\w/g)

["S", "o", "m", "e", "t", "e", "x", "t"]

\W

Opposite of \w.

>>> "Some text!".match(/\W/g)

[" ", "!"]

\d

Matches a number, same as [0-9].

>>> "R2-D2 and C-3PO".match(/\d/g)

["2", "2", "3"]

\D

Opposite of \d; matches non-numbers, same as [^0-9] or [^\d].

>>> "R2-D2 and C-3PO".match(/\D/g)

["R", "-", "D", " ", "a", "n", "d", " ", "C", "-", "P", "O"]

\b

Matches a word boundary such as space or punctuation.

Matching R or D followed by 2:

>>> "R2D2 and C-3PO".match(/[RD]2/g)

["R2", "D2"]

Same as above but only at the end of a word:

>>> "R2D2 and C-3PO".match(/[RD]2\b/g)

["D2"]

Same pattern but the input has a dash, which is also an end of a word:

>>> "R2-D2 and C-3PO".match(/[RD]2\b/g)

["R2", "D2"]

\B

The opposite of \b.

>>> "R2-D2 and C-3PO".match(/[RD]2\B/g)

null

>>> "R2D2 and C-3PO".match(/[RD]2\B/g)

["R2"]

[\b]

Matches the backspace character

\0

The null character

\u0000

Matches a Unicode character, represented by a four-digit hexadecimal number.

>>> "стоян".match(/\u0441\u0442\u043E/)

["сто" ]

\x00

Matches a character code represented by a two-digit hexadecimal number.

>>> "dude".match(/\x64/g)

["d", "d"]

^

The beginning of the string to be matched. If you set the m modifier (multi-line), it matches the beginning of each line.

>>> "regular\nregular\nexpression".match(/r/g);

["r", "r", "r", "r", "r"]

>>> "regular\nregular\nexpression".match(/^r/g);

["r"]

>>> "regular\nregular\nexpression".match(/^r/mg);

["r", "r"]

$

Matches the end of the input or, when using the multi-line modifier, the end of each line.

>>> "regular\nregular\nexpression".match(/r$/g);

null

>>> "regular\nregular\nexpression".match(/r$/mg);

["r", "r"]

.

Matches any character except for the new line and the linefeed.

>>> "regular".match(/r./g);

["re"]

>>> "regular".match(/r.../g);

["regu"]

*

Matches the preceding pattern if it occurs 0 or more times. For example /.*/ will match anything including nothing (an empty input).

>>> "".match(/.*/)

[""]

>>> "anything".match(/.*/)

["anything"]

>>> "anything".match(/n.*h/)

["nyth"]

?

Matches the preceding pattern if it occurs 0 or 1 times.

>>> "anything".match(/ny?/g)

["ny", "n"]

+

Matches the preceding pattern if it occurs at least once (or more times).

>>> "anything".match(/ny+/g) 

["ny"]

>>> "R2-D2 and C-3PO".match(/[a-z]/gi)

["R", "D", "a", "n", "d", "C", "P", "O"]

>>> "R2-D2 and C-3PO".match(/[a-z]+/gi)

["R", "D", "and", "C", "PO"]

{n}

Matches the preceding pattern if it occurs exactly n times.

>>> "regular expression".match(/s/g)

["s", "s"]

>>> "regular expression".match(/s{2}/g)

["ss"]

>>> "regular expression".match(/\b\w{3}/g)

["reg", "exp"]

{min,max}

Matches the preceding pattern if it occurs between min and max number of times. You can omit max, which will mean no maximum, but only a minimum. You cannot omit min.

An example where the input is "doodle" with the "o" repeated 10 times:

>>> "doooooooooodle".match(/o/g)

["o", "o", "o", "o", "o", "o", "o", "o", "o", "o"]

>>> "doooooooooodle".match(/o{2}/g)

["oo", "oo", "oo", "oo", "oo"]

>>> "doooooooooodle".match(/o{2,}/g)

["oooooooooo"]

>>> "doooooooooodle".match(/o{2,6}/g)

["oooooo", "oooo"]

(pattern)

When the pattern is in parentheses, it is remembered so that it can be used for replacements. This is also known as capturing patterns.

The captured matches are available as $1, $2,... $9

Matching all "r" occurrences and repeating them:

>>> "regular expression".replace(/(r)/g, '$1$1')

"rregularr exprression"

Matching "re" and turning it to "er":

>>> "regular expression".replace(/(r)(e)/g, '$2$1')

"ergular experssion"

(?:pattern)

Non-capturing pattern, not remembered and not available in $1, $2...

Here's an example of how "re" is matched, but the "r" is not remembered and the second pattern becomes $1:

>>> "regular expression".replace(/(?:r)(e)/g, '$1$1')

"eegular expeession"

Make sure you pay attention when a special character can have two meanings, as is the case with ^, ?, and \b.