PCRE2EZRegex
Official docs:
https://www.pcre.org/current/doc/html/pcre2syntax.html
https://www.pcre.org/current/doc/html/pcre2pattern.html
options
Usage: word + options(ignore_case=True) word + options('ignore_case') word + options('ignore_case', 'multiline') word + options('ignore_case', multiline=True)
Args: global: Global mode. Match everything in the given string, instead of just the first match multiline: Not recommended. Makes the '^' and '$' special characters match the start and end of lines, instead of the start and end of the string. This is automatically inserted when using line_start and line_end, you shouldn't need to add it manually ignore_case: Perform case-insensitive matching, including expressions that explicitly use uppercase members. Full Unicode matching (such as Ü matching ü) also works unless the ASCII flag is used to disable non-ASCII matches. The current locale does not change the effect of this flag unless the LOCALE flag is also used verbose: Not recommended. Allows for comments and whitespace, which both don't do anything in this library. single_line: Not recommended. Makes the '.' special character match any character at all, including a newline. It's recommended you simply use literally_anything instead lazy: The engine will per default to lazy matching, instead of greedy. It's recommended you just specify greedy=False instead duplicate_groups: This allows regex to accept duplicate pattern names, however each capture group still has its own ID. Thus the two capture groups produce their own match instead of a single combined one noncapturing: Not recomendded. Don't capture with any groups. Instead, simply don't use any groups
any_between
Aliases: amt_between, numBetween, num_between, anyBetween, amtBetween
Match any char between char and and_char, using the ASCII table for reference
Args:
char (str): the first character
and_char (str): the second character
any_char_except
Aliases: anythingExcept, any_except, anyExcept, anyCharExcept, anything_except
This matches any char that is NOT in chars. chars can be multiple parameters,
or a single string of chars to split.
Args:
chars (str): any of the characters to match
any_of
Aliases: anyof, oneof, oneOf, anyOf, one_of
Match any of the given patterns. Note that patterns can be multiple parameters,
or a single string. Can also accept parameters chars and split. If char is set
to True, then patterns must only be a single string, it interprets patterns
as characters, and splits it up to find any of the chars in the string. If
split is set to true, it forces the ?(...) regex syntax instead of the [...]
syntax. It should act the same way, but your output regex will look different.
By default, it just optimizes it for you.
Args:
patterns: any of the patterns to match
chars (bool): whether to interpret patterns as characters (default: auto)
split (bool): whether to split patterns into characters (default: auto)
anything
Aliases: anychar, any_char, anyChar, char
Matches any single character, except a newline. To also match a newline, use literally_anything
at_least_none
Aliases: zero_or_more, atLeast0, any_amt, anyAmt, at_least_0, atLeastNone, noneOrMore, zeroOrMore, none_or_more
at_least_one
Aliases: atLeastOne, atLeast1, at_least_1, one_or_more, oneOrMore
chunk
Aliases: stuff
A "chunk": Any clump of characters up until the next newline
earlier_group
Aliases: same_as_group, same_as, sameAs, sameAsGroup, earlierGroup
Matches whatever the group referenced by num_or_name matched earlier. Must be after a
group which would match num_or_name
Args:
num_or_name (int | str): either the number or name of the previous group
either
Aliases: or_, or
Match either pattern or or_pattern. To choose between more than 2 things,
you can either chain multiple either calls, or use any_of. Note that
the order here matters: it first tries pattern, and if that doesn't
match, then it tries or_pattern.
Args:
pattern: a pattern to match
or_pattern: a pattern to match if the first one fails
hex_digit
Aliases: hexDigit, hex
if_enclosed_with
Aliases: ifEnclosedWith, ifEnclosedBy, if_enclosed_by
if_not_proceded_by
Aliases: ifNotFollowedBy, ifNotProcededBy, if_not_followed_by
if_proceded_by
Aliases: if_followed_by, ifProcededBy, ifFollowedBy
is_exactly
Aliases: exactly, isExactly
letter
Aliases: alpha
Matches just a letter -- not numbers or _ like word_char
letter_num
Aliases: alphaNum, alphanum, alpha_num, letterNum
line_ends_with
Aliases: lineEndsWith, lineEnd, line_end
Matches at a line if it ends with pattern
Args:
pattern: the pattern to match
line_starts_with
Aliases: line_start, lineStart, lineStartsWith
Matches at a line if it starts with pattern
Args:
pattern: the pattern to match
match_at_least
Aliases: matchAtLeast, atLeast, at_least, matchMin, match_min
match_at_most
Aliases: atMost, matchAtMost, at_most
match_max
Aliases: matchMax, repeat
match_more_than
Aliases: more_than, matchMoreThan, moreThan, match_greater_than, matchGreaterThan
match_num
Aliases: amt, matchNum, num, matchAmt, match_amt
match_range
Aliases: matchRange, matchBetween, between, match_between
new_line
Aliases: newLine, newline
optional
Aliases: oneOrNone, one_or_none, opt
period
Aliases: dot
signed_integer
Aliases: signed, signed_int, integer, signedInt, signedInteger
A signed integer, that also accepts e notation, like 123, -123+10, or +123e-10
string_ends_with
Aliases: stringEnd, stringEndsWith, string_end
Matches the string if it ends with pattern
Args:
pattern: the pattern to match
string_starts_with
Aliases: stringStart, string_start, stringStartsWith
Matches the string if it starts with pattern
Args:
pattern: the pattern to match
unsigned_integer
Aliases: unsigned_int, unsignedInt, unsignedInteger, unsigned
An unsigned integer, that also accepts e notation, like 123, or +123e-10
white_char
Aliases: whitechar, whiteChar
whitechunk
Aliases: white_space, white_chunk, whiteChunk, whitespace, whiteSpace
A "chunk" of whitespace. Just any amount of whitespace together
Replacement EZRegexs
replace_entire
Aliases: replaceAll, replaceEntire, replace_all
Puts in its place the entire match
rgroup
Aliases: replaceGroup, replace_group
Puts in its place the group specified, either by group number (for unnamed groups) or group name (for named groups). Named groups are typically also counted by number, check your specific dialect docs for details. Group 0 is handled specially by this function, so it calls for the entire match, even if 0 doesn't mean the entire match in your dialect.
Args:
num_or_name (int | str): the number or name of the group you want to insert here
replace
Generates a valid regex replacement string, using Python f-string like syntax.
Args:
string (str): the templated replacement string
compile (bool): whether to compile the string into an EZRegex subclass instance (default: True)
Example:
``` replace("named: {group}, numbered: {1}, entire: {0}") ```
Like Python f-strings, use {{ and }} to specify { and }
Set the `compile` parameter to False to have it return an EZRegex subclass instance instead of a string
Note: 0 is handled specially by this function, so it calls for the entire match,
even if 0 doesn't mean the entire match in your dialect.
There's a few of advantages to using this instead of just the regular regex replacement syntax:
- It's consistent between dialects
- It's closer to Python f-string syntax, which is cleaner and more familiar
- It handles numbered, named, and entire replacement types the same