Re: Querying a time range across multiple partitions

Поиск
Список
Период
Сортировка
От Jeff Janes
Тема Re: Querying a time range across multiple partitions
Дата
Msg-id CAMkU=1yvqNh1MwSNXNH2ToeWhhq_f+mVwW-MELZyBFwMz8PmeQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Querying a time range across multiple partitions  (Andreas Brandl <ml@3.141592654.de>)
Список pgsql-general
On Mon, Sep 8, 2014 at 1:40 PM, Andreas Brandl <ml@3.141592654.de> wrote:
John,

> On Fri, Sep 5, 2014 at 2:17 PM, John R Pierce < pierce@hogranch.com >
> wrote:
>> On 9/5/2014 10:31 AM, Cal Heldenbrand wrote:
>> Number of child tables: 1581
>> that's an insane number of children. We try and limit it to 50 or so
>> child tables, for instance, 6 months retention by week, of data will
>> millions of rows/day.
>>
>> I've used more than that many for testing purposes, and there was
>> little problem. The main thing is that your insert trigger (if you
>> have one on the master table) needs to be structured as a binary
>> search-like nesting of if..elsif, not a linear-searching like
>> structure. Unless of course almost all inserts go into the newest
>> partition, then it might make more sense to do the linear search
>> with that being the first test. But for performance, better to just
>> insert directly into the correct child table.
> any select that can't be preplanned to a specific child will need to
> check all 1500 children. this is far less efficient than checking,
> say, 50 and letting the b-tree index of each child reject or narrow
> down to the specific row(s). The one is roughly 1500*log(N/1500)
> while the other is 50*log(N/50) at least to a first order
> approximation.

can you explain that further? In the end, that argument sounds like it would always be more efficient to use a single table and its index instead, rather than partitioning it (log(N) < c*log(N/c) for any c > 1, if I'm not totally lost today).

Right.  Partitioning is NOT a generic way to improve query performance.  With rare exceptions, partitioning makes query performance worse, and then partition pruning may or may not serve to recover most of that lost performance.

Partitioning allows you conspire with PostgreSQL to micromanage the the layout of the data in ways that improve manageability and maintainability.  For example, you pay a little overhead each time you run a query and (perhaps) each time you insert a row, but in exchange for that you can "delete" a year of data with a nearly-instantaneous command.
 
Cheers,

Jeff

В списке pgsql-general по дате отправления:

Предыдущее
От: lists@benjamindsmith.com
Дата:
Сообщение: Crash in 9.4 Beta when partially collapsing left outer joins
Следующее
От: Abelard Hoffman
Дата:
Сообщение: Re: psql and tab-delimited output