Обсуждение: Emulating flexible regex replace

Поиск
Список
Период
Сортировка

Emulating flexible regex replace

От
Francisco Olarte
Дата:
( Forgot reply all, forwarding a copy, sorry for the noise. )

Hullo.

On Thu, Oct 23, 2014 at 4:03 PM, twoflower <standa.kurik@gmail.com> wrote:

my scenario is this: I have a *SEGMENT* table with two text fields, *source*
and *target*. From the user, I get the following input:

/source pattern/
/target pattern/

Where both patterns are regexes and moreover the target pattern contains
references to the source in the following way:

Supposing *source* matches the /source pattern/, the $/n/ expressions inside
the /target pattern/ correspond to the captured groups inside *source*.


*If* you have some available char sequence which is not going to be in any of them and your regexp are not too esoteric, you could just try to match source || 'XyZzYX' || target against source_pattern || 'XyZzYX' || target_pattern, replace separator as needed.

If your patterns are ( not ) anchored you may need to insert some .* / ^ / $ and multiline modifiers, but this normally works for me ( in perl normally, but should be easy to do ).

Your performance maybe really bad if your tables are big and you have no more conditions, but any query with general patterns tend to behave this way.

Francisco Olarte.


Re: Emulating flexible regex replace

От
twoflower
Дата:
Thank you Francisco, that's a clever idea. However, I don't think this would
reduce the complexity since the target pattern can contain

1) regular back-references (referencing to matches of its own)
2) the special source text references I mentioned

Obviously, these will have to be written in a different way and this I
believe brings me back to start (or in other words, it's not a silver bullet
obviating the need to rewrite the target pattern manually).

I will probably end up writing a function in PL/Perl which Tom Lane
suggested since I'm apparently not skilled in SQL enough to be able to do it
using a single query without using custom functions.



--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Emulating-flexible-regex-replace-tp5824058p5824109.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Re: Emulating flexible regex replace

От
Francisco Olarte
Дата:
Hi:

On Fri, Oct 24, 2014 at 8:24 AM, twoflower <standa.kurik@gmail.com> wrote:
Thank you Francisco, that's a clever idea. However, I don't think this would
reduce the complexity since the target pattern can contain

1) regular back-references (referencing to matches of its own)
2) the special source text references I mentioned

Well, if I had this problem I would consider solving part of it in the application, not in the database ( like rewriting two patterns into one on the app ).

Obviously, these will have to be written in a different way and this I
believe brings me back to start (or in other words, it's not a silver bullet
obviating the need to rewrite the target pattern manually).

Yeah, I would solve it renumbering backreferences, but it will be a genuine PITA, specially if you want to allow an arbitrary number of them.
 
I will probably end up writing a function in PL/Perl which Tom Lane
suggested since I'm apparently not skilled in SQL enough to be able to do it
using a single query without using custom functions.

You can do two things with a pl/perl function, one is write and use it in the condition of a general query ( where supermatch(s,t,s_p,t_p) ) or, depending on your query, you may write a table return function ( select whatever from my_fancy_func(s_p, t_p, whatever_else ). And, IMO, trying to solve everything in SQL is not always TRTTD. There are a lot of special apps which needs to be solved in a mix of code, and in problems as complex as the one you are proposing ( like, when filtering CDRs by esoteric criteria in my apps ) I've many times found that the easier ( and faster ) way is let the db do it's stuff in sql and postfilter it in the app ( in my case, let the db do time range, duration and number prefix filtering, which it's really good at, reducing the dataset from billions to tens of thousands of records which it streams really fast into a perl app which then does the later fancy conditions in a breeze ).
 
Regards.
Francisco Olarte.

Re: Emulating flexible regex replace

От
twoflower
Дата:
Thank you Francisco.

In fact, I am already solving part of the problem in my application -
fetching from the DB the records matching the source pattern and then
filtering them in the application's memory by matching against the target
pattern, with the references replaced (it's a breeze in C#).

It works very vell. However, I am not completely satisfied with i as it's
unnecessarily loading larger data set than it absolutely must. Besides, I'd
also like to get some experience in DB programming. That's why the PL/Perl
way seems pretty attractive to me.



--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Emulating-flexible-regex-replace-tp5824058p5824174.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Re: Emulating flexible regex replace

От
Francisco Olarte
Дата:
Hi:

On Fri, Oct 24, 2014 at 6:13 PM, twoflower <standa.kurik@gmail.com> wrote:
....
It works very vell. However, I am not completely satisfied with i as it's
unnecessarily loading larger data set than it absolutely must. Besides, I'd
also like to get some experience in DB programming. That's why the PL/Perl
way seems pretty attractive to me.

Try it them. But bear in mind you are not *unnecessarily* loading the data set, the server will need to load it to apply the plperl filters, you will just avoid loading it in the client and transmitting it. You are making a trade off, a design decission. You are trading some network and client cpu usage for some server cpu usage and coding complexity. The tricky part is the the coding complexity, which needs a lot of resources you may never recover. Part of the experience in DB programming is knowing what NOT to do in the database, so coding this will be good, the worst thing that could happen is you do it and then drop it after learning one thing.

Francisco Olarte.