TY - GEN
T1 - On Lookaheads in Regular Expressions with Backreferences
AU - Chida, Nariyoshi
AU - Terauchi, Tachio
N1 - Funding Information:
This work was supported by JSPS KAKENHI Grant Numbers 17H01720, 18K19787, 20H04162, 20K20625, and 22H03570.
Funding Information:
Funding This work was supported by JSPS KAKENHI Grant 20H04162, 20K20625, and 22H03570.
Publisher Copyright:
© Nariyoshi Chida and Tachio Terauchi
PY - 2022/6/1
Y1 - 2022/6/1
N2 - Many modern regular expression engines employ various extensions to give more expressive support for real-world usages. Among the major extensions employed by many of the modern regular expression engines are backreferences and lookaheads. A question of interest about these extended regular expressions is their expressive power. Previous works have shown that (i) the extension by lookaheads does not enhance the expressive power, i.e., the expressive power of regular expressions with lookaheads is still regular, and that (ii) the extension by backreferences enhances the expressive power, i.e., the expressive power of regular expressions with backreferences (abbreviated as rewb) is no longer regular. This raises the following natural question: Does the extension of regular expressions with backreferences by lookaheads enhance the expressive power of regular expressions with backreferences? This paper answers the question positively by proving that adding either positive lookaheads or negative lookaheads increases the expressive power of rewb (the former abbreviated as rewblp and the latter as rewbln). A consequence of our result is that neither the class of finite state automata nor that of memory automata (MFA) of Schmid [14] (which corresponds to regular expressions with backreferenes but without lookaheads) corresponds to rewblp or rewbln. To fill the void, as a first step toward building such automata, we propose a new class of automata called memory automata with positive lookaheads (PLMFA) that corresponds to rewblp. The key idea of PLMFA is to extend MFA with a new kind of memories, called positive-lookahead memory, that is used to simulate the backtracking behavior of positive lookaheads. Interestingly, our positive-lookahead memories are almost perfectly symmetric to the capturing-group memories of MFA. Therefore, our PLMFA can be seen as a natural extension of MFA that can be obtained independently of its original intended purpose of simulating rewblp.
AB - Many modern regular expression engines employ various extensions to give more expressive support for real-world usages. Among the major extensions employed by many of the modern regular expression engines are backreferences and lookaheads. A question of interest about these extended regular expressions is their expressive power. Previous works have shown that (i) the extension by lookaheads does not enhance the expressive power, i.e., the expressive power of regular expressions with lookaheads is still regular, and that (ii) the extension by backreferences enhances the expressive power, i.e., the expressive power of regular expressions with backreferences (abbreviated as rewb) is no longer regular. This raises the following natural question: Does the extension of regular expressions with backreferences by lookaheads enhance the expressive power of regular expressions with backreferences? This paper answers the question positively by proving that adding either positive lookaheads or negative lookaheads increases the expressive power of rewb (the former abbreviated as rewblp and the latter as rewbln). A consequence of our result is that neither the class of finite state automata nor that of memory automata (MFA) of Schmid [14] (which corresponds to regular expressions with backreferenes but without lookaheads) corresponds to rewblp or rewbln. To fill the void, as a first step toward building such automata, we propose a new class of automata called memory automata with positive lookaheads (PLMFA) that corresponds to rewblp. The key idea of PLMFA is to extend MFA with a new kind of memories, called positive-lookahead memory, that is used to simulate the backtracking behavior of positive lookaheads. Interestingly, our positive-lookahead memories are almost perfectly symmetric to the capturing-group memories of MFA. Therefore, our PLMFA can be seen as a natural extension of MFA that can be obtained independently of its original intended purpose of simulating rewblp.
KW - Backreferences
KW - Lookaheads
KW - Memory automata
KW - Regular expressions
UR - http://www.scopus.com/inward/record.url?scp=85133707915&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85133707915&partnerID=8YFLogxK
U2 - 10.4230/LIPIcs.FSCD.2022.15
DO - 10.4230/LIPIcs.FSCD.2022.15
M3 - Conference contribution
AN - SCOPUS:85133707915
T3 - Leibniz International Proceedings in Informatics, LIPIcs
BT - 7th International Conference on Formal Structures for Computation and Deduction, FSCD 2022
A2 - Felty, Amy P.
PB - Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing
T2 - 7th International Conference on Formal Structures for Computation and Deduction, FSCD 2022
Y2 - 2 August 2022 through 5 August 2022
ER -