Re: On Scalability

Поиск
Список
Период
Сортировка
От Greg Smith
Тема Re: On Scalability
Дата
Msg-id 4C533839.8090705@2ndquadrant.com
обсуждение исходный текст
Ответ на Re: On Scalability  (Vincenzo Romano <vincenzo.romano@notorand.it>)
Список pgsql-hackers
Vincenzo Romano wrote:
> By using PREPARE I run the query planned sooner and I should use
> the plan with the later execution.
> You can bet that some of the PREPAREd query variables will
> pertain to either the child table's CHECK contraints (for table partitions)
> or to the partial index's WHERE condition (for index partitioning).
>   

Prepared statements are not necessarily a cure for long query planning 
time, because the sort of planning decisions made with partitioned child 
tables and index selection can need to know the parameter values to 
execute well; that's usually the situation rather than the exception 
with partitions.  You run the risk that the generic prepared plan will 
end up looking at all the partitions, because at preparation plan time 
it can't figure out which can be excluded.  Can only figure that out 
once they're in there for some types of queries.

I think you aren't quite lined up with the people suggesting "test it" 
in terms of what that means.  The idea is not that you should build a 
full on application test case yet, which can be very expensive.  The 
idea is that you might explore things like "when I partition this way 
increasing the partitions from 1 to n, does query time go up linearly?" 
by measuring with fake data and a machine-generated schema.  What's 
happened in some of these cases is that, despite the theoretical, some 
constant or external overhead ends up dominating behavior for lower 
numbers.  As an example, it was recognized that the amount of statistics 
for a table collected with default_statistics_target had a quadratic 
impact on some aspects of performance.  But it turned out that for the 
range of interesting values to most people, the measured runtime did not 
go up with the square as feared.  Only way that was sorted out was to 
build a simple simulation.

Here's a full example from that discussion that shows the sort of tests 
you probably want to try, and comments on the perils of guessing based 
on theory rather than testing:

http://archives.postgresql.org/pgsql-hackers/2008-12/msg00601.php
http://archives.postgresql.org/pgsql-hackers/2008-12/msg00687.php

generate_series can be very helpful here, and you can even use that to 
generate timestamps if you need them in the data set.

That said, anecdotally everyone agrees that partitions don't scale well 
into even the very low hundreds for most people, and doing multi-level 
ones won't necessarily normally drop query planning time--just the cost 
of maintaining the underlying tables and indexes.  My opinion is that 
building a simple partitioned case and watching how the EXPLAIN plans 
change as you adjust things will be more instructive for you than either 
asking about it or reading the source.  Vary the parameters, watch the 
plans, measure things and graph them if you want to visualize the 
behavior better.  Same thing goes for large numbers of partial indexes, 
which have a similar query planning impact, but unlike partitions I 
haven't seen anyone analyze them via benchmarks.  I'm sure you could get 
help here (probably the performance list is a better spot though) with 
getting your test case right if you wanted to try and nail that down.

-- 
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Yeb Havinga
Дата:
Сообщение: Re: patch for check constraints using multiple inheritance
Следующее
От: Robert Haas
Дата:
Сообщение: Re: On Scalability