Обсуждение: regexp_replace to remove sql comments
Hi,
I am trying to clean up the query field returned by the
pg_stat_statements extension and remove all comments.
Some of the queries in the query field contain comments like '-- some
comment' and also '/* c style comments */'
I have managed to strip off the '--' comments and also white space but
after trying numerous regex for this via google but I am stuck.
WITH to_clean AS (
SELECT
regexp_replace(
regexp_replace(trim(query), '--[^\r\n]*', '') --clear up
comments like this one <-- this is ok
, '\s+', ' ', 'g') as q --clear up white space <-- this is ok
FROM public.pg_stat_statements
WHERE dbid IN (SELECT oid FROM pg_database WHERE datname =
current_database())
)
SELECT regexp_replace(q,'/\*.*\*/','') as q /* strip off comments like
this */ <-- cannot get a regex to do this
FROM to_clean ORDER BY q
Im now thinking it may be better to do in a pgsql function as I think if
the comments are in queries then they need to be ignored.
Has anyone done anything like this?
Thanks,
Mike.
>________________________________________
>Von: pgsql-general-owner@postgresql.org [pgsql-general-owner@postgresql.org]" im Auftrag von "Mike
[mike@wolman.co.uk]
>Gesendet: Mittwoch, 28. Oktober 2015 20:04
>An: pgsql-general@postgresql.org
>Betreff: [GENERAL] regexp_replace to remove sql comments
>
>Hi,
>
>I am trying to clean up the query field returned by the
>pg_stat_statements extension and remove all comments.
>
>Some of the queries in the query field contain comments like '-- some
>comment' and also '/* c style comments */'
>
>I have managed to strip off the '--' comments and also white space but
>after trying numerous regex for this via google but I am stuck.
>
>WITH to_clean AS (
> SELECT
> regexp_replace(
> regexp_replace(trim(query), '--[^\r\n]*', '') --clear up
>comments like this one <-- this is ok
> , '\s+', ' ', 'g') as q --clear up white space <-- this is ok
> FROM public.pg_stat_statements
> WHERE dbid IN (SELECT oid FROM pg_database WHERE datname =
>current_database())
>)
>
>SELECT regexp_replace(q,'/\*.*\*/','') as q /* strip off comments like
>this */ <-- cannot get a regex to do this
>FROM to_clean ORDER BY q
Hi,
Does this help ?
select regexp_replace(' aaa /*
x
y
z
*/ foo', '\/\*.+\*\/','','g'):
regards,
Marc Mamin
>
>Im now thinking it may be better to do in a pgsql function as I think if
>the comments are in queries then they need to be ignored.
>
>Has anyone done anything like this?
>
>Thanks,
>
>Mike.
>
>
>
>--
>Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>To make changes to your subscription:
>http://www.postgresql.org/mailpref/pgsql-general
>
Thanks with a bit of moving stuff about I think thats sorted it - in
case anyone every needs it:
SELECT
query,
trim(regexp_replace(
regexp_replace(
regexp_replace(query,'\/\*.+\*\/','','g'),
'--[^\r\n]*', ' ', 'g')
, '\s+', ' ', 'g')) as q
FROM public.pg_stat_statements
WHERE dbid IN (SELECT oid FROM pg_database WHERE datname =
current_database())
order by query
Thanks again,
Mike.
On 28/10/2015 22:43, Marc Mamin wrote:
> ', '\/\*.+\*\/','','g'):
Mike <mike@wolman.co.uk> writes:
> Thanks with a bit of moving stuff about I think thats sorted it - in
> case anyone every needs it:
> SELECT
> query,
> trim(regexp_replace(
> regexp_replace(
> regexp_replace(query,'\/\*.+\*\/','','g'),
> '--[^\r\n]*', ' ', 'g')
> , '\s+', ' ', 'g')) as q
> FROM public.pg_stat_statements
> WHERE dbid IN (SELECT oid FROM pg_database WHERE datname =
> current_database())
This doesn't look too reliable from here:
1. Doesn't handle multiline /* comments.
2. Does wrong thing if more than one /* comment appears on one line.
(You could improve that by using .*? instead of .+, but then it'd
do the wrong thing with nested /* comments.)
3. Breaks things if either -- or /* appear inside a string literal,
double-quoted identifier, or $$ literal.
I'm not at all sure that it's possible to handle this requirement 100%
correctly with regexes; they're unable to do context-sensitive processing.
But so far as pg_stat_statements is concerned, why would you need to
do this at all? The duplicate-query elimination it does should be
insensitive to comments already.
regards, tom lane