Quick answer
Use (?<name>...) to name a group. Access matches via match.groups.name in JavaScript, m.group("name") in Python, $+{name} in Perl, and $<name> in replacements. The cost is three extra characters; the benefit is a regex that documents itself.
Numbered capture groups ($1, $2, $3) made sense in 1970. Today, every modern regex engine supports named groups, and they make patterns dramatically easier to read three months later.
// Named groups are supported in JS since ES2018
const re = /(?<protocol>https?):\/\/(?<host>[^\/]+)(?<path>\/[^?]*)?/;
const url = 'https://peakproductivity.online/regex-tester-pro/seo/';
const m = url.match(re);
console.log(m.groups);
// {
// protocol: 'https',
// host: 'peakproductivity.online',
// path: '/regex-tester-pro/seo/'
// }
// Replacements also use named refs:
url.replace(re, '$<host>$<path>');
// 'peakproductivity.online/regex-tester-pro/seo/'
The problem with numbered groups
A regex with three or more groups quickly becomes unreadable. (\d{4})-(\d{2})-(\d{2})\s+(\d{2}):(\d{2}):(\d{2}) works, but you have to count to six to find the seconds. Reorder the groups and every $1, $2 reference downstream needs to change.
Named groups solve both. (?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})\s+(?<hour>\d{2}):(?<min>\d{2}):(?<sec>\d{2}) reads like a labeled tuple. The match object has a groups property with each name as a key. Reordering the regex does not break consumer code.
Syntax across languages
The accepted syntax for naming a group:
- JavaScript (ES2018+):
(?<name>...). Access viamatch.groups.name. Replacement reference$<name>. - Python:
(?P<name>...). Access viam.group("name"). Replacement reference\g<name>. (Modern Python also accepts(?<name>...)in the regex module.) - PCRE / Perl:
(?<name>...)or(?P<name>...). Access via$+{name}. - .NET:
(?<name>...). Access viamatch.Groups["name"].Value.
The Python (?P<name>...) syntax came first historically; the rest of the ecosystem standardized on (?<name>...) later. If you write polyglot regex, prefer the modern form.
Backreferences inside the pattern
A named group can reference itself later in the same regex. Useful for matching balanced quotes or repeated tokens:
(?<quote>["']).+?\k<quote>matches a quoted string where opening and closing quote characters are identical. "hello" matches; "hello' does not.<(?<tag>\w+)[^>]*>.*?<\/\k<tag>>matches a complete HTML element where opening and closing tag names match.
The backreference syntax depends on the engine: \k<name> in JS/PCRE/.NET, (?P=name) in Python's re.
In replacements, names beat numbers
Replacement strings can reference captured groups by name as well as number. Same example, two ways:
- By number:
'$2/$3/$1' - By name:
'$<month>/$<day>/$<year>'
The named version survives a regex refactor. The numbered version silently breaks the moment you add a group earlier in the pattern. Anyone who has done date-format gymnastics for an internal report knows which one ages better.
Common mistakes
Three traps:
- Reusing a name.
(?<x>a)|(?<x>b)works in PCRE and .NET but throws a syntax error in JavaScript. Rename them or wrap in a non-capturing alternation. - Forgetting the question mark.
(name>\d+)is a literal character class plus group, not a named capture. The?after the opening parenthesis is required. - Mixing
(?P<and(?<across files. Stick to one per project. Modern Python supports both, but consistency makes grep easier.
Why named groups change how you write regex
Once you commit to naming every group, two things change:
- Patterns get longer but more legible. The extra characters per group are repaid in documentation. A reviewer can understand the pattern without running it.
- Refactoring becomes safe. Add a new group, remove an old one, reorder, the consumer code does not care because it accesses by name.
The investment is small (about three keystrokes per group) and the maintenance dividend compounds. The next time you find yourself debugging a regex from six months ago, named groups will be the difference between "oh that's the user id" and "what is $3 again".