RegexBuilder: Swift's Type-Safe Regular Expression DSL


You have written NSRegularExpression patterns in Objective-C. You have wrestled with escape sequences inside raw strings. You have shipped regex patterns that compiled fine but silently matched the wrong thing because a capture group was off by one. Swift’s RegexBuilder DSL eliminates that entire class of bugs by making regular expressions type-safe, composable, and readable at the call site.

This post covers the RegexBuilder DSL end-to-end: literal regex syntax, the builder DSL, typed captures, TryCapture with transforms, and real-world patterns for parsing structured text. We will not cover the Swift runtime internals of the regex engine or custom RegexComponent conformances from scratch — those deserve their own deep dives.

This guide assumes you are familiar with strings and characters in Swift, closures, and result builders.

Contents

The Problem

Imagine you are building a Pixar movie catalog service. You receive plaintext log lines from a legacy system that track box office results, and you need to extract structured data from each line.

A typical log entry looks like this:

[2023-06-16] RELEASE "Inside Out 2" studio:Pixar gross:$1,566,380,600

Here is the classic approach using a raw string regex:

import Foundation

let logLine = #"[2023-06-16] RELEASE "Inside Out 2" studio:Pixar gross:$1,566,380,600"#

let pattern = #"\[(\d{4}-\d{2}-\d{2})\] RELEASE "(.+?)" studio:(\w+) gross:\$([0-9,]+)"#
let regex = try! NSRegularExpression(pattern: pattern)

if let match = regex.firstMatch(
    in: logLine,
    range: NSRange(logLine.startIndex..., in: logLine)
) {
    let dateRange = Range(match.range(at: 1), in: logLine)!
    let titleRange = Range(match.range(at: 2), in: logLine)!
    let studioRange = Range(match.range(at: 3), in: logLine)!
    let grossRange = Range(match.range(at: 4), in: logLine)!

    print(logLine[dateRange])  // 2023-06-16
    print(logLine[titleRange]) // Inside Out 2
    print(logLine[studioRange]) // Pixar
    print(logLine[grossRange]) // 1,566,380,600
}

This works, but the problems are substantial:

  • No compile-time validation. A typo in the pattern string compiles just fine and fails silently at runtime.
  • Untyped captures. Every capture group is a Substring. You need manual conversion to Date, Int, or domain types — and you can mix up the indices.
  • NSRange bridging. The dance between NSRange and Swift Range<String.Index> is tedious and error-prone.
  • Readability. Six months from now, no one on your team (including you) will be able to tell what (\d{4}-\d{2}-\d{2}) captures without reading the whole expression.

Regex Literals vs. RegexBuilder

Swift 5.7 introduced two complementary ways to write regular expressions, both powered by the same Regex<Output> type defined in SE-0350 and SE-0351.

Regex Literals

The regex literal syntax uses forward slashes and is concise for simple patterns:

let datePattern = /\d{4}-\d{2}-\d{2}/
let input = "Toy Story was released on 1995-11-22"

if let match = input.firstMatch(of: datePattern) {
    print(match.output) // 1995-11-22
}
1995-11-22

Regex literals are checked at compile time — a malformed pattern is a compiler error, not a runtime crash. But for complex patterns with multiple captures, they still look like traditional regex and carry the same readability burden.

RegexBuilder DSL

The RegexBuilder module provides a result-builder-powered DSL that reads like structured code. The same date pattern becomes:

import RegexBuilder

let datePattern = Regex {
    Repeat(.digit, count: 4)
    "-"
    Repeat(.digit, count: 2)
    "-"
    Repeat(.digit, count: 2)
}

Both approaches produce the same Regex type under the hood. The difference is entirely at the authoring layer: the DSL trades brevity for clarity, and that trade-off pays for itself the moment a pattern has more than one or two capture groups.

Tip: You can mix regex literals and builder DSL in the same expression. Embed a literal inside a builder block with Regex { /\d+/ } when a sub-pattern is simple enough that the literal form is clearer.

Building Patterns with the DSL

Let us rebuild the log-line parser from the problem section using the builder DSL. We will go step by step so you can see how each RegexBuilder component maps to its regex equivalent.

Character Classes and Quantifiers

RegexBuilder provides types like One, Repeat, Optionally, ZeroOrMore, and OneOrMore — all generic over the character classes they match.

import RegexBuilder

let yearPattern = Repeat(.digit, count: 4)      // \d{4}
let separator = "-"                               // literal "-"
let monthOrDay = Repeat(.digit, count: 2)         // \d{2}

let dateRegex = Regex {
    yearPattern
    separator
    monthOrDay
    separator
    monthOrDay
}

Notice how you can extract sub-patterns into local constants and compose them. This is result builders at work — if you have built custom DSLs with @resultBuilder, the mechanics here will feel familiar. See Result Builders in Swift for that background.

Alternation and Grouping

Use ChoiceOf for alternation (the | operator in traditional regex):

let studioRegex = Regex {
    ChoiceOf {
        "Pixar"
        "Disney"
        "Marvel"
        "Lucasfilm"
    }
}

let line = "studio:Pixar"
if let match = line.firstMatch(of: studioRegex) {
    print(match.output) // Pixar
}
Pixar

ChoiceOf is type-safe: all branches must produce the same output type. If you try to mix branches that capture different types, the compiler catches it before you even run the code.

Anchors and Boundaries

Anchors like Anchor.startOfLine, Anchor.endOfLine, and Anchor.wordBoundary work exactly as their regex counterparts (^, $, \b):

let lineStartRegex = Regex {
    Anchor.startOfLine
    "["
    Repeat(.digit, count: 4)
}

Typed Captures and TryCapture

This is where RegexBuilder truly shines. Captures are not untyped substrings — they are generic type parameters baked into the Regex<Output> type itself.

Basic Capture

Wrap any component in Capture to extract its matched text:

import RegexBuilder

let movieTitleRegex = Regex {
    "\""
    Capture {
        OneOrMore(.reluctant) {
            /./
        }
    }
    "\""
}
// Type: Regex<(Substring, Substring)>

let input = #"RELEASE "Finding Nemo" studio:Pixar"#
if let match = input.firstMatch(of: movieTitleRegex) {
    let fullMatch: Substring = match.output.0
    let title: Substring = match.output.1
    print(title)
}
Finding Nemo

The compiler knows match.output is a tuple of (Substring, Substring) — the full match plus one captured substring. You cannot accidentally access .output.2 because it does not exist.

TryCapture with Transforms

TryCapture accepts a transform closure that converts the matched substring into a specific type. If the transform returns nil, the regex engine backtracks — exactly like a failed lookahead.

import RegexBuilder
import Foundation

struct BoxOfficeEntry {
    let title: String
    let gross: Int
}

let grossRegex = Regex {
    "gross:$"
    TryCapture {
        OneOrMore(.any, .reluctant)
    } transform: { matched -> Int? in
        let cleaned = matched.replacing(",", with: "")
        return Int(cleaned)
    }
}
// Type: Regex<(Substring, Int)>

The output type of this regex is Regex<(Substring, Int)> — not (Substring, Substring). The transform is encoded in the type system. If the closure returns nil (say, the matched text is not a valid integer), the engine backtracks and tries alternative matches rather than crashing or returning garbage.

Putting It All Together

Now let us build the complete log-line parser. Compare this to the NSRegularExpression version from earlier:

import RegexBuilder
import Foundation

let logLineRegex = Regex {
    "["
    Capture {
        Repeat(.digit, count: 4)
        "-"
        Repeat(.digit, count: 2)
        "-"
        Repeat(.digit, count: 2)
    }
    "] RELEASE \""
    Capture {
        OneOrMore(.reluctant) {
            /./
        }
    }
    "\" studio:"
    Capture {
        OneOrMore(.word)
    }
    " gross:$"
    TryCapture {
        OneOrMore {
            CharacterClass(
                .digit,
                .anyOf(",")
            )
        }
    } transform: { matched -> Int? in
        Int(matched.replacing(",", with: ""))
    }
}
// Type: Regex<(Substring, Substring, Substring, Substring, Int)>

let logLine = #"[2023-06-16] RELEASE "Inside Out 2" studio:Pixar gross:$1,566,380,600"#

if let match = logLine.firstMatch(of: logLineRegex) {
    let (_, date, title, studio, gross) = match.output
    print("Date: \(date)")
    print("Title: \(title)")
    print("Studio: \(studio)")
    print("Gross: \(gross)")
}
Date: 2023-06-16
Title: Inside Out 2
Studio: Pixar
Gross: 1566380600

Four captures, four distinct types, zero index juggling. If you rename a capture or change its transform, the compiler forces you to update every call site. That is the value proposition of RegexBuilder in one example.

Apple Docs: RegexBuilder — Swift Standard Library

Advanced Usage

Reusable Components with RegexComponent

Any type conforming to RegexComponent can be dropped into a builder block. This lets you build a library of domain-specific parsers:

import RegexBuilder
import Foundation

struct PixarDateParser: RegexComponent {
    typealias RegexOutput = Date

    var body: some RegexComponent {
        TryCapture {
            Repeat(.digit, count: 4)
            "-"
            Repeat(.digit, count: 2)
            "-"
            Repeat(.digit, count: 2)
        } transform: { substring -> Date? in
            let formatter = ISO8601DateFormatter()
            formatter.formatOptions = [.withFullDate]
            return formatter.date(from: String(substring))
        }
    }
}

// Usage — drop it into any builder block
let entryRegex = Regex {
    "["
    PixarDateParser()
    "]"
}
// Type: Regex<(Substring, Date)>

The PixarDateParser is testable in isolation, reusable across patterns, and its output type (Date) propagates into any regex that includes it. This is composition at its best.

Reluctant and Possessive Quantifiers

By default, quantifiers in RegexBuilder are greedy. You control this with the .reluctant and .possessive repetition behaviors:

import RegexBuilder

// Greedy: matches as much as possible, then backtracks
let greedy = OneOrMore(.any)

// Reluctant: matches as little as possible
let reluctant = OneOrMore(.any, .reluctant)

// Possessive: matches as much as possible, never backtracks
let possessive = OneOrMore(.any, .possessive)

Warning: Possessive quantifiers (.possessive) never backtrack. Use them only when you are certain the matched segment cannot overlap with what follows. Getting this wrong does not crash — it just silently fails to match. Prefer .reluctant when parsing delimited content like quoted strings.

References and Named Captures

RegexBuilder supports back-references through the Reference type. This is useful when you need to match a repeated element:

import RegexBuilder

let quoteRef = Reference(Substring.self)

let quotedStringRegex = Regex {
    Capture(as: quoteRef) {
        ChoiceOf {
            "\""
            "'"
        }
    }
    OneOrMore(.reluctant) {
        /./
    }
    quoteRef // Must match the same quote character
}

let input = #"'Buzz Lightyear'"#
if let match = input.firstMatch(of: quotedStringRegex) {
    print(match.output.0) // 'Buzz Lightyear'
}
'Buzz Lightyear'

The Reference ensures the closing delimiter matches the opening one — single quote matches single quote, double matches double. Try doing that cleanly with a traditional regex literal.

Integration with Swift’s String Processing APIs

Regex values integrate natively with Swift’s string processing methods. You are not limited to firstMatch(of:):

import RegexBuilder

let priceRegex = Regex {
    "$"
    Capture {
        OneOrMore {
            CharacterClass(.digit, .anyOf(","))
        }
    }
}

let report = """
    Toy Story: $373,554,033
    Finding Nemo: $940,335,536
    Coco: $807,082,196
    """

// All matches
let matches = report.matches(of: priceRegex)
for match in matches {
    print(match.output.1)
}

// Replace
let redacted = report.replacing(priceRegex, with: "$[REDACTED]")

// Split
let segments = "Woody|Buzz|Jessie".split(separator: /\|/)
373,554,033
940,335,536
807,082,196

These APIs were introduced alongside Regex in SE-0350 and work on any StringProtocol conforming type.

Performance Considerations

The Swift Regex engine compiles patterns at the type level, but the actual matching still happens at runtime. Here are the performance characteristics you should know:

Compilation cost. Regex literals and builder expressions are compiled once when the enclosing scope is entered. Store them in a static property or top-level constant if you are matching inside a tight loop:

import RegexBuilder

enum MovieLogParser {
    // Compiled once, reused across calls
    static let logRegex = Regex {
        "["
        Capture { OneOrMore(.digit) }
        "]"
    }

    static func parseAll(_ lines: [String]) -> [Substring] {
        lines.compactMap { line in
            line.firstMatch(of: logRegex)?.output.1
        }
    }
}

Backtracking. Complex patterns with nested quantifiers can exhibit exponential backtracking, just like any regex engine. The .possessive quantifier and atomic groups (Local { ... }) prevent unnecessary backtracking:

import RegexBuilder

// Atomic group — once matched, the engine will not backtrack into this block
let efficientPattern = Regex {
    Local {
        OneOrMore(.digit)
    }
    ";"
}

Unicode correctness. Swift regexes operate on Character (extended grapheme cluster) boundaries by default, which is correct but slower than byte-level matching. If you are processing ASCII-only data and need raw throughput, consider the .asciiOnlyDigit and .asciiOnlyWord character class properties to opt into faster matching.

Tip: Profile with Instruments’ Time Profiler before micro-optimizing regex performance. In most apps, the cost of regex matching is negligible compared to network I/O or view layout. Optimize only when the profiler tells you to.

When to Use (and When Not To)

ScenarioRecommendation
Parsing structured logs with multiple fieldsUse RegexBuilder. Typed captures and composable components pay for themselves immediately.
Simple one-off string checks (e.g., “does this contain a digit?”)Use a regex literal (/\d/). The builder DSL adds ceremony without benefit for trivial patterns.
Parsing well-defined formats like JSON or XMLDo not use regex at all. Use JSONDecoder, XMLParser, or a dedicated parser. Regex cannot handle recursive grammars.
Validating user input (email, phone)Regex literals are fine for quick validation. For production email validation, prefer NSDataDetector or server-side checks.
Processing CSV or fixed-width tabular dataRegexBuilder works well here, especially with TryCapture for numeric conversions. Consider Scanner for simpler cases.
Migrating from NSRegularExpressionReplace incrementally. RegexBuilder and NSRegularExpression can coexist in the same codebase during migration.

RegexBuilder requires iOS 16+ / macOS 13+ (Swift 5.7 runtime). If your deployment target is earlier, you are stuck with NSRegularExpression. There is no backport available.

Note: RegexBuilder was introduced in Swift 5.7 alongside SE-0351 and demonstrated in WWDC 2022 session Meet Swift Regex and Swift Regex: Beyond the Basics. Both sessions are worth watching for the design rationale and live demos.

Summary

  • RegexBuilder is a result-builder DSL that makes regular expressions type-safe, composable, and readable — no more counting parentheses in opaque pattern strings.
  • Typed captures encode the output type directly in Regex<Output>. The compiler prevents index mismatches and type confusion at every call site.
  • TryCapture with transforms converts matched text into domain types (Int, Date, custom types) inline, with automatic backtracking on failure.
  • Custom RegexComponent types let you build reusable, testable parsing components that compose cleanly inside any builder block.
  • Performance is predictable — store compiled regex values as static properties, use .possessive or Local to avoid backtracking, and profile before optimizing.

If you found the result-builder mechanics behind RegexBuilder interesting, explore how the same @resultBuilder attribute powers SwiftUI and custom DSLs in Result Builders in Swift. For patterns where you need to decode structured data from network responses rather than raw strings, see Mastering Codable in Swift.