Re: [PATCH] optional cleaning queries stored in pg_stat_statements

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: [PATCH] optional cleaning queries stored in pg_stat_statements
Дата
Msg-id 2223.1320592101@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: [PATCH] optional cleaning queries stored in pg_stat_statements  (Peter Geoghegan <peter@2ndquadrant.com>)
Ответы Re: [PATCH] optional cleaning queries stored in pg_stat_statements  ("Tomas Vondra" <tv@fuzzy.cz>)
Re: [PATCH] optional cleaning queries stored in pg_stat_statements  (Greg Smith <greg@2ndQuadrant.com>)
Re: [PATCH] optional cleaning queries stored in pg_stat_statements  (Peter Geoghegan <peter@2ndquadrant.com>)
Список pgsql-hackers
Peter Geoghegan <peter@2ndquadrant.com> writes:
> I'm a couple of days away from posting a much better principled
> implementation of pg_stat_statements normalisation. To normalise, we
> perform a selective serialisation of the query tree, which is hashed.

That seems like an interesting approach, and definitely a lot less of
a kluge than what Tomas suggested.  Are you intending to hash the raw
grammar output tree, the parse analysis result, the rewriter result,
or the actual plan tree?  I don't necessarily have a preconceived notion
about which is best (I can think of pluses and minuses for each), but
we should hear your explanation of which one you chose and what your
reasoning was.

> ... It also does things
> like intelligently distinguishing between queries with different
> limit/offset constant values, as these constants are deemed to be
> differentiators of queries for our purposes. A guiding principal that
> I've followed is that anything that could result in a different plan
> is a differentiator of queries.

This claim seems like bunk, unless you're hashing the plan tree,
in which case it's tautological.  As an example, switching from
"where x < 1" to "where x < 2" could make the planner change plans,
if there's a sufficiently large difference in the selectivity of the two
cases (or even if there's a very small difference, but we're right at
the tipping point where the estimated costs are equal).  There's no way
to know this in advance unless you care to duplicate all the planner
cost estimation logic.  And I don't think you actually want to classify
"where x < 1" and "where x < 2" differently just because they *might*
give different plans in corner cases.

I'm not real sure whether it's better to classify on the basis of
similar plans or similar original queries, anyway.  This seems like
something that could be open for debate about use-cases.

> There will be additional infrastructure added to the parser to support
> normalisation of query strings for the patch I'll be submitting (that
> obviously won't be supported in the version that builds against
> existing Postgres versions that I'll make available). Essentially,
> I'll be adding a length field to certain nodes,

This seems like a good way to get your patch rejected: adding overhead
to the core system for the benefit of a feature that not everyone cares
about is problematic.  Why do you need it anyway?  Surely it's possible
to determine the length of a literal token after the fact.

More generally, if you're hashing anything later than the raw grammar
tree, I think that generating a suitable representation of the queries
represented by a single hash entry is going to be problematic anyway.
There could be significant differences --- much more than just the
values of constants --- between query strings that end up being
semantically the same.  Or for that matter we could have identical query
strings that wind up being considered different because of the action of
search_path or other context.

It might be that the path of least resistance is to document that we
select one of the actual query strings "at random" to represent all the
queries that went into the same hash entry, and not even bother with
trying to strip out constants.  The effort required to do that seems
well out of proportion to the reward, if we can't do a perfect job of
representing the common aspects of the queries.
        regards, tom lane


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Sushant Sinha
Дата:
Сообщение: Re: a tsearch issue
Следующее
От: hubert depesz lubaczewski
Дата:
Сообщение: Re: [GENERAL] Strange problem with create table as select * from table;