Re: [PATCH] optional cleaning queries stored in pg_stat_statements

Поиск
Список
Период
Сортировка
От Greg Smith
Тема Re: [PATCH] optional cleaning queries stored in pg_stat_statements
Дата
Msg-id 4EB6E65C.90406@2ndQuadrant.com
обсуждение исходный текст
Ответ на Re: [PATCH] optional cleaning queries stored in pg_stat_statements  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: [PATCH] optional cleaning queries stored in pg_stat_statements  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On 11/06/2011 01:08 PM, Tom Lane wrote:
> Peter Geoghegan<peter@2ndquadrant.com>  writes:
>    
>> ... It also does things
>> like intelligently distinguishing between queries with different
>> limit/offset constant values, as these constants are deemed to be
>> differentiators of queries for our purposes. A guiding principal that
>> I've followed is that anything that could result in a different plan
>> is a differentiator of queries.
>>      
> This claim seems like bunk, unless you're hashing the plan tree,
> in which case it's tautological.

Peter's patch adds a planner hook and hashes at that level.  Since this 
cat is rapidly escaping its bag and impacting other contributors, we 
might as well share the work in progress early if anyone has a burning 
desire to look at the code.  A diff against the version I've been 
internally reviewing in prep for community submission is at 
https://github.com/greg2ndQuadrant/postgres/compare/master...pg_stat_statements_norm  
Easier to browse than ask questions probing about it I think.  Apologies 
to Peter for sharing his work before he was completely ready; there is a 
list of known problems with it.  I also regret the thread hijack here.

The first chunk of code is a Python based regression test program, and 
an open item here is the best way to turn that into a standard 
regression test set.

>> There will be additional infrastructure added to the parser to support
>> normalisation of query strings for the patch I'll be submitting (that
>> obviously won't be supported in the version that builds against
>> existing Postgres versions that I'll make available). Essentially,
>> I'll be adding a length field to certain nodes,
>>      
> This seems like a good way to get your patch rejected: adding overhead
> to the core system for the benefit of a feature that not everyone cares
> about is problematic.

Struggling around this area is the main reason this code hasn't been 
submitted yet.  To step back for a moment, there are basically three 
options here that any code like this can take, in regards to storing the 
processed query name used as the key:

1) Use the first instance of that query seen as the "reference" version
2) Use the last instance seen
3) Transform the text of the query in a way that's stable across all 
duplicates of that statement, and output that

Downstream tools operating on this data, things that will sample 
pg_stat_statements multiple times, need some sort of stable query text 
they can operate on.  It really needs to survive server restart too.  
Neither (1) nor (2) seem like usable solutions.  Both have been 
implemented already in Peter's patch, with (2) being what's in the 
current patch.  How best to do (3) instead is not obvious though.

Doing the query matching operating at the planner level seems effective 
at side-stepping the problem of needing to parse the SQL, which is where 
most implementations of this idea get stuck doing fragile 
transformations.  My own first try at this back in September and Tomas's 
patch both fall into that category.  That part of Peter's work seems to 
work as expected.  That still leaves the problem of "parsed query -> 
stable normalized string".  I think that is an easier one to solve than 
directly taking on the entirety of "query text -> stable normalized 
string" though.  Peter has an idea he's exploring for how to implement 
that, and any amount of overhead it adds to people who don't use this 
feature is an obvious concern.  There are certainly use cases that don't 
care about this problem, ones that would happily take (1) or (2).  I'd 
rather not ship yet another not quite right for everyone version of 
pg_stat_statements again though; only solving too limited of a use case 
is the big problem with the one that's already there.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Jeroen Vermeulen
Дата:
Сообщение: Re: foreign key locks, 2nd attempt
Следующее
От: Magnus Hagander
Дата:
Сообщение: Re: Measuring relation free space