Guides Regular Expressions

Regular Expressions Mastery

Master pattern matching with regex. Learn syntax, metacharacters, groups, and practical applications for text processing.

12 minute read Advanced

What Are Regular Expressions?

Regular expressions (regex) are powerful patterns used to match, search, and manipulate text. They provide a concise way to describe complex string patterns.

Regex is supported in virtually every programming language including JavaScript, Python, Java, PHP, and more. Learning regex is an essential skill for any developer.

Why Learn Regex?

A single regex pattern can replace dozens of lines of string manipulation code. It's essential for data validation, log parsing, search and replace, and text extraction.

Basic Syntax

A regex pattern is typically written between forward slashes:

/pattern/flags

Examples:
/hello/      Matches "hello" in any string
/hello/i     Case-insensitive match
/hello/g     Global match (find all occurrences)

Literal Matching

The simplest regex matches exact text:

/cat/     matches "cat" in "The cat sat on the mat"
/123/     matches "123" in "Order #12345"
/hello/   matches "hello" in "Say hello world"

Metacharacters

Special characters that have meaning in regex patterns:

. (Dot) - Any Single Character

/c.t/     matches "cat", "cot", "cut", "c9t", "c t"
/a.b/     matches "aab", "a1b", "a-b", "a b"

^ (Caret) - Start of String

/^Hello/  matches "Hello world"
          does NOT match "Say Hello"

$ (Dollar) - End of String

/world$/  matches "Hello world"
          does NOT match "world peace"

Character Classes [ ]

Match any single character from a set:

/[aeiou]/     matches any vowel
/[0-9]/       matches any digit (0 through 9)
/[A-Z]/       matches any uppercase letter
/[a-zA-Z]/    matches any letter
/[a-zA-Z0-9]/ matches any alphanumeric character

Negated Character Classes [^ ]

/[^0-9]/      matches any character that is NOT a digit
/[^aeiou]/    matches any character that is NOT a vowel
/[^a-z]/      matches any character that is NOT lowercase

Predefined Character Classes

\d    matches any digit [0-9]
\D    matches any non-digit [^0-9]
\w    matches any word character [a-zA-Z0-9_]
\W    matches any non-word character
\s    matches any whitespace (space, tab, newline)
\S    matches any non-whitespace character
Phone Number Pattern
/\d{3}-\d{3}-\d{4}/

Matches: "555-123-4567"
Explanation: 3 digits, hyphen, 3 digits, hyphen, 4 digits

Quantifiers

Specify how many times a pattern element should repeat:

Basic Quantifiers

*       Zero or more      /ab*c/   matches "ac", "abc", "abbc", "abbbc"
+       One or more       /ab+c/   matches "abc", "abbc" (NOT "ac")
?       Zero or one       /colou?r/ matches "color" and "colour"
{n}     Exactly n         /a{3}/   matches "aaa"
{n,}    n or more         /a{2,}/  matches "aa", "aaa", "aaaa"...
{n,m}   Between n and m   /a{2,4}/ matches "aa", "aaa", "aaaa"

Greedy vs Lazy Matching

By default, quantifiers are greedy - they match as much as possible:

Pattern: /".+"/
Text:    "Hello" and "World"
Result:  Matches entire '"Hello" and "World"' (greedy)

Add ? after quantifier to make it lazy - match as little as possible:

Pattern: /".+?"/
Text:    "Hello" and "World"
Result:  Matches '"Hello"' and '"World"' separately (lazy)
Catastrophic Backtracking

Avoid nested quantifiers like (a+)+ or (.*)*. They can cause exponential processing time and freeze your application with certain inputs.

Groups & Capturing

Capturing Groups ( )

Parentheses create groups that capture matched text for later use:

Pattern: /(\d{3})-(\d{3})-(\d{4})/
Input:   "555-123-4567"

Captures:
  Group 0 (full match): "555-123-4567"
  Group 1: "555"   (area code)
  Group 2: "123"   (prefix)
  Group 3: "4567"  (line number)

Non-Capturing Groups (?: )

Group patterns without capturing (better performance):

/(?:Mr|Mrs|Ms)\.\s+(\w+)/

Matches: "Mr. Smith", "Mrs. Jones", "Ms. Davis"
Only captures the name, not the title

Alternation ( | )

Match one pattern OR another:

/cat|dog/       matches "cat" or "dog"
/gr(a|e)y/      matches "gray" or "grey"
/(red|blue|green) car/  matches color + " car"

Backreferences

Reference previously captured groups:

/(['"]).*?\1/   matches quoted strings (same quote type)
                  \1 refers to first captured group

Matches: "hello" or 'world'
No match: "hello' (mismatched quotes)

Lookahead & Lookbehind

Assert what comes before or after without including in match:

(?=...)   Positive lookahead  - must be followed by
(?!...)   Negative lookahead  - must NOT be followed by
(?<=...)  Positive lookbehind - must be preceded by
(?<!...)  Negative lookbehind - must NOT be preceded by

Examples:
/\d+(?=\s*dollars)/   matches "100" in "100 dollars"
/\d+(?!\s*cents)/     matches numbers NOT followed by "cents"
/(?<=\$)\d+/          matches "50" in "$50"
/(?<!\$)\d+/          matches numbers NOT preceded by "$"

Practical Examples

Email Validation

/^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/

Breakdown:
^                   Start of string
[a-zA-Z0-9._%+-]+   Username (letters, numbers, special chars)
@                   Literal @ symbol
[a-zA-Z0-9.-]+      Domain name
\.                  Literal dot (escaped)
[a-zA-Z]{2,}        TLD (minimum 2 letters)
$                   End of string

Matches: [email protected], [email protected]

URL Validation

/^(https?:\/\/)?([\w.-]+)\.([a-z]{2,})(\/\S*)?$/i

Breakdown:
(https?:\/\/)?      Optional http:// or https://
([\w.-]+)           Domain name
\.([a-z]{2,})       TLD (.com, .org, etc.)
(\/\S*)?            Optional path

Matches: example.com, https://www.site.org/page

Password Strength

/^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$/

Requirements (using lookaheads):
(?=.*[a-z])         At least one lowercase
(?=.*[A-Z])         At least one uppercase
(?=.*\d)            At least one digit
(?=.*[@$!%*?&])     At least one special character
{8,}                Minimum 8 characters total

Credit Card Number

/^\d{4}[- ]?\d{4}[- ]?\d{4}[- ]?\d{4}$/

Matches:
1234567890123456
1234 5678 9012 3456
1234-5678-9012-3456

Date Extraction (MM/DD/YYYY)

/\b(0?[1-9]|1[0-2])\/(0?[1-9]|[12]\d|3[01])\/(\d{4})\b/

Matches: 1/15/2026, 01/15/2026, 12/31/2025
Groups capture: month, day, year separately

IP Address

/^(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3})$/

Matches: 192.168.1.1, 10.0.0.255
Note: Does not validate range (0-255)
Find & Replace Example
Task: Swap first and last names

Find:    /(\w+)\s+(\w+)/
Replace: $2, $1

Input:  "John Doe"
Output: "Doe, John"

Common Flags

g   Global - find all matches, not just the first
i   Case-insensitive - ignore uppercase/lowercase
m   Multiline - ^ and $ match line boundaries
s   Dotall - dot (.) matches newlines too
u   Unicode - enable full Unicode support
y   Sticky - match only from lastIndex position

Usage in Different Languages

// JavaScript
const regex = /pattern/gi;
const matches = text.match(regex);
const result = text.replace(regex, 'replacement');

# Python
import re
match = re.search(r'pattern', text)
all_matches = re.findall(r'pattern', text)
result = re.sub(r'pattern', 'replacement', text)

// Java
Pattern p = Pattern.compile("pattern", Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(text);
while (m.find()) { /* process match */ }

# Ruby
text.scan(/pattern/i) { |match| puts match }
text.gsub(/pattern/, 'replacement')

Best Practices

  • Start simple: Build complex patterns incrementally, testing each step
  • Test thoroughly: Use online tools like regex101.com to debug
  • Comment patterns: Complex regex should be documented
  • Escape metacharacters: Use \. \* \? \+ for literals
  • Use raw strings: Python r'...' avoids double-escaping
  • Be specific: Prefer \d+ over .+ when possible
  • Avoid backtracking: Test patterns with long input strings
Recommended Testing Tools

regex101.com - Best online tester with detailed explanations
regexr.com - Visual regex builder and reference
debuggex.com - Visual regex debugger with railroad diagrams

When NOT to Use Regex

  • Parsing HTML/XML: Use proper parsers like BeautifulSoup or lxml
  • Complex nested structures: JSON, code - use dedicated parsers
  • Simple string checks: startsWith(), includes() are faster
  • Email validation: Consider using built-in validators for production

Regex is incredibly powerful but has limitations. For complex data structures, use dedicated parsers. For simple operations, built-in string methods are often faster and more readable.