Friday, August 9, 2013

Is there a way to define custom shorthands in regular expressions?

Is there a way to define custom shorthands in regular expressions?

I have a regular expression of the form
def parse(self, format_string):
for m in re.finditer(
r"""(?: \$ \( ( [^)]+ ) \) ) # the field access specifier
| (
(?:
\n | . (?= \$ \( ) # any one single character before
the '$('
)
| (?:
\n | . (?! \$ \( ) # any one single character, except
the one before the '$('
)*
)""",
format_string,
re.VERBOSE):
...
I would like to replace all the repeating sequences (\$ \() with some
custom shorthand "constant", which would look like this:
def parse(self, format_string):
re.<something>('\BEGIN = \$\(')
for m in re.finditer(
r"""(?: \BEGIN ( [^)]+ ) \) ) # the field access specifier
| (
(?:
\n | . (?= \BEGIN ) # any one single character before
the '$('
)
| (?:
\n | . (?! \BEGIN ) # any one single character, except
the one before the '$('
)*
)""",
format_string,
re.VERBOSE):
...
Is there a way to do this with regular expressions themselves (i.e. not
using Python's string formatting to substitute \BEGIN with \$\()?

No comments:

Post a Comment