Обсуждение: Psql regex is NFA or DFA?
So I've finished reading Jeffery Friedl's _Mastering Regular Expressions_ and while I don't need regex in PostgreSQL I know I'll do it for something - eventually. The book makes a distinction between DFA, POSIX NFA and Traditional NFA and then ascribes some properties and behaviours to each. So what sort does PostgreSQL have? Joshua b. Jore -{ weird geeky madness }-> http://www.greentechnologist.org
On Tue, Sep 10, 2002 at 02:16:51AM -0500, Josh Jore wrote: > So I've finished reading Jeffery Friedl's _Mastering Regular Expressions_ > and while I don't need regex in PostgreSQL I know I'll do it for something > - eventually. The book makes a distinction between DFA, POSIX NFA and > Traditional NFA and then ascribes some properties and behaviours to each. > So what sort does PostgreSQL have? Regex in PostgreSQL code: Copyright (c) 1992, 1993, 1994 Henry Spencer. Copyright (c) 1992, 1993, 1994 The Regents of the University of California. All rights reserved. I think it's: POSIX 1003.2, section 2.8 (Regular Expression Notation) -- Karel Zak <zakkr@zf.jcu.cz> http://home.zf.jcu.cz/~zakkr/ C, PostgreSQL, PHP, WWW, http://docs.linux.cz, http://mape.jcu.cz
Josh Jore <josh@greentechnologist.org> writes: > So I've finished reading Jeffery Friedl's _Mastering Regular Expressions_ > and while I don't need regex in PostgreSQL I know I'll do it for something > - eventually. The book makes a distinction between DFA, POSIX NFA and > Traditional NFA and then ascribes some properties and behaviours to each. > So what sort does PostgreSQL have? Well, you could read the code (src/backend/regex), or you could apply the tests that Friedl suggests to distinguish the type of an unknown engine ... My guess is that it's an NFA, but I dunno if Spencer did the POSIX semantics or not. regards, tom lane
Tom Lane wrote: > Josh Jore <josh@greentechnologist.org> writes: > > So I've finished reading Jeffery Friedl's _Mastering Regular Expressions_ > > and while I don't need regex in PostgreSQL I know I'll do it for something > > - eventually. The book makes a distinction between DFA, POSIX NFA and > > Traditional NFA and then ascribes some properties and behaviours to each. > > So what sort does PostgreSQL have? > > Well, you could read the code (src/backend/regex), or you could apply > the tests that Friedl suggests to distinguish the type of an unknown > engine ... > > My guess is that it's an NFA, but I dunno if Spencer did the POSIX > semantics or not. I am continuing to talk to Henry about getting a newer version of his regex library. He is working on it but is not yet finished. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
On Tue, 10 Sep 2002, Bruce Momjian wrote: > Tom Lane wrote: > > Josh Jore <josh@greentechnologist.org> writes: > > > So I've finished reading Jeffery Friedl's _Mastering Regular Expressions_ > > > and while I don't need regex in PostgreSQL I know I'll do it for something > > > - eventually. The book makes a distinction between DFA, POSIX NFA and > > > Traditional NFA and then ascribes some properties and behaviours to each. > > > So what sort does PostgreSQL have? > > > > Well, you could read the code (src/backend/regex), or you could apply > > the tests that Friedl suggests to distinguish the type of an unknown > > engine ... > > > > My guess is that it's an NFA, but I dunno if Spencer did the POSIX > > semantics or not. > > I am continuing to talk to Henry about getting a newer version of his > regex library. He is working on it but is not yet finished. Do you have some details about the implementation itself? I would like to work on implementing a regex engine using the Shift-And algorithm families, and that can probably give a performance improvement over the current engine. -- Alvaro Herrera (<alvherre[@]dcc.uchile.cl>)
Alvaro Herrera wrote: > > I am continuing to talk to Henry about getting a newer version of his > > regex library. He is working on it but is not yet finished. > > Do you have some details about the implementation itself? I would like > to work on implementing a regex engine using the Shift-And algorithm > families, and that can probably give a performance improvement over the > current engine. [ CC'ing Henry Spencer. ] No, I don't know any details except that his regex library is in the new tcl code and has to repackaged as a separate library. Henry, could we help you finish up your regex stuff? Henry's regex work is the same code that is in *BSD regex (at least BSD/OS, FreeBSD, NetBSD), which I have found to be pretty slow in certain complex cases, so much so that I have moved to super-sed (ssed) for use on my local machine where I have seen 100x speed improvements with ssed. I am not sure how slow our regex really is, but for the sake of PostgreSQL and of the BSD regex library, I would love to see a newer version. In fact, I just had an email exchange with Henry about two weeks ago. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Henry's regex work is the same code that is in *BSD regex (at least > BSD/OS, FreeBSD, NetBSD), which I have found to be pretty slow in > certain complex cases, You're speaking of his *old* package (the one we currently use), no? Friedl seems to think that the current Tcl regex engine (Henry's new code) is the most advanced thing on the planet. regards, tom lane
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Henry's regex work is the same code that is in *BSD regex (at least > > BSD/OS, FreeBSD, NetBSD), which I have found to be pretty slow in > > certain complex cases, > > You're speaking of his *old* package (the one we currently use), no? Yes, that is the old stuff shipped with BSD 4.4 and included in all the *BSD releases I have checked. > Friedl seems to think that the current Tcl regex engine (Henry's new > code) is the most advanced thing on the planet. I have not heard that, but it is good to hear. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Tom Lane dijo: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Henry's regex work is the same code that is in *BSD regex (at least > > BSD/OS, FreeBSD, NetBSD), which I have found to be pretty slow in > > certain complex cases, > > You're speaking of his *old* package (the one we currently use), no? > > Friedl seems to think that the current Tcl regex engine (Henry's new > code) is the most advanced thing on the planet. Oh, so the TODO item "replace with newer code" is not just vaporware? Well, I won't try to compete with Spencer's code in that case. -- Alvaro Herrera (<alvherre[a]atentus.com>) "Linux transformó mi computadora, de una `máquina para hacer cosas', en un aparato realmente entretenido, sobre el cual cada día aprendo algo nuevo" (Jaime Salinas)
Alvaro Herrera <alvherre@atentus.com> writes: > Tom Lane dijo: >> Friedl seems to think that the current Tcl regex engine (Henry's new >> code) is the most advanced thing on the planet. > Oh, so the TODO item "replace with newer code" is not just vaporware? > Well, I won't try to compete with Spencer's code in that case. The code is certainly not vaporware: Tcl's been using it for awhile. But the code in a Tcl-independent package that we could easily use is a different story. Perhaps you could offer Henry some help in making it into a clean library package ... regards, tom lane