RegexBuilder: Swift's Type-Safe Regular Expression DSL
You have written NSRegularExpression patterns in Objective-C. You have wrestled with escape sequences inside raw
strings. You have shipped regex patterns that compiled fine but silently matched the wrong thing because a capture group
was off by one. Swift’s RegexBuilder DSL eliminates that entire class of bugs by making regular expressions type-safe,
composable, and readable at the call site.
This post covers the RegexBuilder DSL end-to-end: literal regex syntax, the builder DSL, typed captures, TryCapture
with transforms, and real-world patterns for parsing structured text. We will not cover the Swift runtime internals of
the regex engine or custom RegexComponent conformances from scratch — those deserve their own deep dives.
This guide assumes you are familiar with strings and characters in Swift, closures, and result builders.
Contents
- The Problem
- Regex Literals vs. RegexBuilder
- Building Patterns with the DSL
- Typed Captures and TryCapture
- Advanced Usage
- Performance Considerations
- When to Use (and When Not To)
- Summary
The Problem
Imagine you are building a Pixar movie catalog service. You receive plaintext log lines from a legacy system that track box office results, and you need to extract structured data from each line.
A typical log entry looks like this:
[2023-06-16] RELEASE "Inside Out 2" studio:Pixar gross:$1,566,380,600
Here is the classic approach using a raw string regex:
import Foundation
let logLine = #"[2023-06-16] RELEASE "Inside Out 2" studio:Pixar gross:$1,566,380,600"#
let pattern = #"\[(\d{4}-\d{2}-\d{2})\] RELEASE "(.+?)" studio:(\w+) gross:\$([0-9,]+)"#
let regex = try! NSRegularExpression(pattern: pattern)
if let match = regex.firstMatch(
in: logLine,
range: NSRange(logLine.startIndex..., in: logLine)
) {
let dateRange = Range(match.range(at: 1), in: logLine)!
let titleRange = Range(match.range(at: 2), in: logLine)!
let studioRange = Range(match.range(at: 3), in: logLine)!
let grossRange = Range(match.range(at: 4), in: logLine)!
print(logLine[dateRange]) // 2023-06-16
print(logLine[titleRange]) // Inside Out 2
print(logLine[studioRange]) // Pixar
print(logLine[grossRange]) // 1,566,380,600
}
This works, but the problems are substantial:
- No compile-time validation. A typo in the pattern string compiles just fine and fails silently at runtime.
- Untyped captures. Every capture group is a
Substring. You need manual conversion toDate,Int, or domain types — and you can mix up the indices. - NSRange bridging. The dance between
NSRangeand SwiftRange<String.Index>is tedious and error-prone. - Readability. Six months from now, no one on your team (including you) will be able to tell what
(\d{4}-\d{2}-\d{2})captures without reading the whole expression.
Regex Literals vs. RegexBuilder
Swift 5.7 introduced two complementary ways to write regular expressions, both powered by the same Regex<Output> type
defined in SE-0350 and
SE-0351.
Regex Literals
The regex literal syntax uses forward slashes and is concise for simple patterns:
let datePattern = /\d{4}-\d{2}-\d{2}/
let input = "Toy Story was released on 1995-11-22"
if let match = input.firstMatch(of: datePattern) {
print(match.output) // 1995-11-22
}
1995-11-22
Regex literals are checked at compile time — a malformed pattern is a compiler error, not a runtime crash. But for complex patterns with multiple captures, they still look like traditional regex and carry the same readability burden.
RegexBuilder DSL
The RegexBuilder module provides a result-builder-powered DSL that reads like structured code. The same date pattern
becomes:
import RegexBuilder
let datePattern = Regex {
Repeat(.digit, count: 4)
"-"
Repeat(.digit, count: 2)
"-"
Repeat(.digit, count: 2)
}
Both approaches produce the same Regex type under the hood. The difference is entirely at the authoring layer: the DSL
trades brevity for clarity, and that trade-off pays for itself the moment a pattern has more than one or two capture
groups.
Tip: You can mix regex literals and builder DSL in the same expression. Embed a literal inside a builder block with
Regex { /\d+/ }when a sub-pattern is simple enough that the literal form is clearer.
Building Patterns with the DSL
Let us rebuild the log-line parser from the problem section using the builder DSL. We will go step by step so you can
see how each RegexBuilder component maps to its regex equivalent.
Character Classes and Quantifiers
RegexBuilder provides types like One, Repeat, Optionally, ZeroOrMore, and OneOrMore — all generic over the
character classes they match.
import RegexBuilder
let yearPattern = Repeat(.digit, count: 4) // \d{4}
let separator = "-" // literal "-"
let monthOrDay = Repeat(.digit, count: 2) // \d{2}
let dateRegex = Regex {
yearPattern
separator
monthOrDay
separator
monthOrDay
}
Notice how you can extract sub-patterns into local constants and compose them. This is result builders at work — if you
have built custom DSLs with @resultBuilder, the mechanics here will feel familiar. See
Result Builders in Swift for that background.
Alternation and Grouping
Use ChoiceOf for alternation (the | operator in traditional regex):
let studioRegex = Regex {
ChoiceOf {
"Pixar"
"Disney"
"Marvel"
"Lucasfilm"
}
}
let line = "studio:Pixar"
if let match = line.firstMatch(of: studioRegex) {
print(match.output) // Pixar
}
Pixar
ChoiceOf is type-safe: all branches must produce the same output type. If you try to mix branches that capture
different types, the compiler catches it before you even run the code.
Anchors and Boundaries
Anchors like Anchor.startOfLine, Anchor.endOfLine, and Anchor.wordBoundary work exactly as their regex
counterparts (^, $, \b):
let lineStartRegex = Regex {
Anchor.startOfLine
"["
Repeat(.digit, count: 4)
}
Typed Captures and TryCapture
This is where RegexBuilder truly shines. Captures are not untyped substrings — they are generic type parameters baked
into the Regex<Output> type itself.
Basic Capture
Wrap any component in Capture to extract its matched text:
import RegexBuilder
let movieTitleRegex = Regex {
"\""
Capture {
OneOrMore(.reluctant) {
/./
}
}
"\""
}
// Type: Regex<(Substring, Substring)>
let input = #"RELEASE "Finding Nemo" studio:Pixar"#
if let match = input.firstMatch(of: movieTitleRegex) {
let fullMatch: Substring = match.output.0
let title: Substring = match.output.1
print(title)
}
Finding Nemo
The compiler knows match.output is a tuple of (Substring, Substring) — the full match plus one captured substring.
You cannot accidentally access .output.2 because it does not exist.
TryCapture with Transforms
TryCapture accepts a transform closure that converts the matched substring into a specific type. If the transform
returns nil, the regex engine backtracks — exactly like a failed lookahead.
import RegexBuilder
import Foundation
struct BoxOfficeEntry {
let title: String
let gross: Int
}
let grossRegex = Regex {
"gross:$"
TryCapture {
OneOrMore(.any, .reluctant)
} transform: { matched -> Int? in
let cleaned = matched.replacing(",", with: "")
return Int(cleaned)
}
}
// Type: Regex<(Substring, Int)>
The output type of this regex is Regex<(Substring, Int)> — not (Substring, Substring). The transform is encoded in
the type system. If the closure returns nil (say, the matched text is not a valid integer), the engine backtracks and
tries alternative matches rather than crashing or returning garbage.
Putting It All Together
Now let us build the complete log-line parser. Compare this to the NSRegularExpression version from earlier:
import RegexBuilder
import Foundation
let logLineRegex = Regex {
"["
Capture {
Repeat(.digit, count: 4)
"-"
Repeat(.digit, count: 2)
"-"
Repeat(.digit, count: 2)
}
"] RELEASE \""
Capture {
OneOrMore(.reluctant) {
/./
}
}
"\" studio:"
Capture {
OneOrMore(.word)
}
" gross:$"
TryCapture {
OneOrMore {
CharacterClass(
.digit,
.anyOf(",")
)
}
} transform: { matched -> Int? in
Int(matched.replacing(",", with: ""))
}
}
// Type: Regex<(Substring, Substring, Substring, Substring, Int)>
let logLine = #"[2023-06-16] RELEASE "Inside Out 2" studio:Pixar gross:$1,566,380,600"#
if let match = logLine.firstMatch(of: logLineRegex) {
let (_, date, title, studio, gross) = match.output
print("Date: \(date)")
print("Title: \(title)")
print("Studio: \(studio)")
print("Gross: \(gross)")
}
Date: 2023-06-16
Title: Inside Out 2
Studio: Pixar
Gross: 1566380600
Four captures, four distinct types, zero index juggling. If you rename a capture or change its transform, the compiler
forces you to update every call site. That is the value proposition of RegexBuilder in one example.
Apple Docs:
RegexBuilder— Swift Standard Library
Advanced Usage
Reusable Components with RegexComponent
Any type conforming to RegexComponent can be
dropped into a builder block. This lets you build a library of domain-specific parsers:
import RegexBuilder
import Foundation
struct PixarDateParser: RegexComponent {
typealias RegexOutput = Date
var body: some RegexComponent {
TryCapture {
Repeat(.digit, count: 4)
"-"
Repeat(.digit, count: 2)
"-"
Repeat(.digit, count: 2)
} transform: { substring -> Date? in
let formatter = ISO8601DateFormatter()
formatter.formatOptions = [.withFullDate]
return formatter.date(from: String(substring))
}
}
}
// Usage — drop it into any builder block
let entryRegex = Regex {
"["
PixarDateParser()
"]"
}
// Type: Regex<(Substring, Date)>
The PixarDateParser is testable in isolation, reusable across patterns, and its output type (Date) propagates into
any regex that includes it. This is composition at its best.
Reluctant and Possessive Quantifiers
By default, quantifiers in RegexBuilder are greedy. You control this with the .reluctant and .possessive
repetition behaviors:
import RegexBuilder
// Greedy: matches as much as possible, then backtracks
let greedy = OneOrMore(.any)
// Reluctant: matches as little as possible
let reluctant = OneOrMore(.any, .reluctant)
// Possessive: matches as much as possible, never backtracks
let possessive = OneOrMore(.any, .possessive)
Warning: Possessive quantifiers (
.possessive) never backtrack. Use them only when you are certain the matched segment cannot overlap with what follows. Getting this wrong does not crash — it just silently fails to match. Prefer.reluctantwhen parsing delimited content like quoted strings.
References and Named Captures
RegexBuilder supports back-references through the Reference type. This is useful when you need to match a repeated
element:
import RegexBuilder
let quoteRef = Reference(Substring.self)
let quotedStringRegex = Regex {
Capture(as: quoteRef) {
ChoiceOf {
"\""
"'"
}
}
OneOrMore(.reluctant) {
/./
}
quoteRef // Must match the same quote character
}
let input = #"'Buzz Lightyear'"#
if let match = input.firstMatch(of: quotedStringRegex) {
print(match.output.0) // 'Buzz Lightyear'
}
'Buzz Lightyear'
The Reference ensures the closing delimiter matches the opening one — single quote matches single quote, double
matches double. Try doing that cleanly with a traditional regex literal.
Integration with Swift’s String Processing APIs
Regex values integrate natively with Swift’s string processing methods. You are not limited to firstMatch(of:):
import RegexBuilder
let priceRegex = Regex {
"$"
Capture {
OneOrMore {
CharacterClass(.digit, .anyOf(","))
}
}
}
let report = """
Toy Story: $373,554,033
Finding Nemo: $940,335,536
Coco: $807,082,196
"""
// All matches
let matches = report.matches(of: priceRegex)
for match in matches {
print(match.output.1)
}
// Replace
let redacted = report.replacing(priceRegex, with: "$[REDACTED]")
// Split
let segments = "Woody|Buzz|Jessie".split(separator: /\|/)
373,554,033
940,335,536
807,082,196
These APIs were introduced alongside Regex in
SE-0350 and work on any
StringProtocol conforming type.
Performance Considerations
The Swift Regex engine compiles patterns at the type level, but the actual matching still happens at runtime. Here are
the performance characteristics you should know:
Compilation cost. Regex literals and builder expressions are compiled once when the enclosing scope is entered.
Store them in a static property or top-level constant if you are matching inside a tight loop:
import RegexBuilder
enum MovieLogParser {
// Compiled once, reused across calls
static let logRegex = Regex {
"["
Capture { OneOrMore(.digit) }
"]"
}
static func parseAll(_ lines: [String]) -> [Substring] {
lines.compactMap { line in
line.firstMatch(of: logRegex)?.output.1
}
}
}
Backtracking. Complex patterns with nested quantifiers can exhibit exponential backtracking, just like any regex
engine. The .possessive quantifier and atomic groups (Local { ... }) prevent unnecessary backtracking:
import RegexBuilder
// Atomic group — once matched, the engine will not backtrack into this block
let efficientPattern = Regex {
Local {
OneOrMore(.digit)
}
";"
}
Unicode correctness. Swift regexes operate on Character (extended grapheme cluster) boundaries by default, which
is correct but slower than byte-level matching. If you are processing ASCII-only data and need raw throughput, consider
the .asciiOnlyDigit and .asciiOnlyWord character class properties to opt into faster matching.
Tip: Profile with Instruments’ Time Profiler before micro-optimizing regex performance. In most apps, the cost of regex matching is negligible compared to network I/O or view layout. Optimize only when the profiler tells you to.
When to Use (and When Not To)
| Scenario | Recommendation |
|---|---|
| Parsing structured logs with multiple fields | Use RegexBuilder. Typed captures and composable components pay for themselves immediately. |
| Simple one-off string checks (e.g., “does this contain a digit?”) | Use a regex literal (/\d/). The builder DSL adds ceremony without benefit for trivial patterns. |
| Parsing well-defined formats like JSON or XML | Do not use regex at all. Use JSONDecoder, XMLParser, or a dedicated parser. Regex cannot handle recursive grammars. |
| Validating user input (email, phone) | Regex literals are fine for quick validation. For production email validation, prefer NSDataDetector or server-side checks. |
| Processing CSV or fixed-width tabular data | RegexBuilder works well here, especially with TryCapture for numeric conversions. Consider Scanner for simpler cases. |
Migrating from NSRegularExpression | Replace incrementally. RegexBuilder and NSRegularExpression can coexist in the same codebase during migration. |
RegexBuilder requires iOS 16+ / macOS 13+ (Swift 5.7 runtime). If your deployment target is earlier, you are stuck
with NSRegularExpression. There is no backport available.
Note:
RegexBuilderwas introduced in Swift 5.7 alongside SE-0351 and demonstrated in WWDC 2022 session Meet Swift Regex and Swift Regex: Beyond the Basics. Both sessions are worth watching for the design rationale and live demos.
Summary
RegexBuilderis a result-builder DSL that makes regular expressions type-safe, composable, and readable — no more counting parentheses in opaque pattern strings.- Typed captures encode the output type directly in
Regex<Output>. The compiler prevents index mismatches and type confusion at every call site. TryCapturewith transforms converts matched text into domain types (Int,Date, custom types) inline, with automatic backtracking on failure.- Custom
RegexComponenttypes let you build reusable, testable parsing components that compose cleanly inside any builder block. - Performance is predictable — store compiled regex values as static properties, use
.possessiveorLocalto avoid backtracking, and profile before optimizing.
If you found the result-builder mechanics behind RegexBuilder interesting, explore how the same @resultBuilder
attribute powers SwiftUI and custom DSLs in Result Builders in Swift. For patterns
where you need to decode structured data from network responses rather than raw strings, see
Mastering Codable in Swift.