Notes and Gotchas

  • The different Regular Expression dialects don't all have the same features, and those features don't all work the same way. I've tried to standardize these as best I can and use reasonable names for all the elements. If you're confused by something not working as expected, check your dialect's documentation for details.
  • Be careful to call methods on the entire pattern: chunk + word.str() is not the same as (chunk + word).str().
  • In regular regex, a lot of random things capture groups for no apparent reason. All regexes in EZRegex intentionally capture passively, so to capture any groups, use group(), with the optional name parameter.
    • The name parameter is keyword only, for clarity, and for complicated internal reasons.
  • EZRegexs are not particularly compact. They're designed to be functional and accurate, but they're not going to be as compact as a handwritten regex.
  • All EZRegexs (except for raw) auto-sanitize strings given to them, so there's no need to escape characters or use r strings. This does mean, however, that you cannot pass actual regex strings to any of them, as they'll think you're talking about it literally (unless you want that, of course). To include already written regex strings, use raw
  • The pattern parameters can accept strings, other EZRegexs, or entire sequences of EZRegex patterns. It can also accept things that can be cast to a string, but it won't sanitize them when it does and will throw a warning, so it's better to cast to a string yourself.
  • Note that I have both camelCase and snake_case versions of each of the EZRegexs, because different languages have different conventions. Both versions function identically. There's also additional names for some of the EZRegexs, see psuedonyms.py for specifics.
  • The invert function can accept any regular expression, not just EZRegex expressions, if you want to use it independently of the rest of the library. It can only invert Python regexs, however.
  • When using EZRegexs that modify an existing chain, they modify up to the last use of the . operator. For example:
    • digit + whitechunk.opt == digit + optional(whitepace)
    • digit.whitechunk.opt == optional(digit + whitechunk)
  • whitespace is an alias for whitechunk (\s+), and not white_char (\s)
  • Due to the code structure of the library types hints and docstrings are impossible. Refer to the documentation for details.
  • EZRegexs are immutible

lazy_check_params

The EZRegex class has a class member called lazy_check_params which, when enabled, delays parameter checking until the regex is compiled. Usually, all parameters are checked against their functions using inspect.signature. Turning this on can be useful for performance, in theory, but I can't think of a practical situation you would legitimately need to use it. But it's there if you want it.