Re: Querying a time range across multiple partitions

Поиск
Список
Период
Сортировка
От Andreas Brandl
Тема Re: Querying a time range across multiple partitions
Дата
Msg-id 149ba37d-0688-48bc-98f0-0ada102dd377@store1.zcs.ext.wpsrv.net
обсуждение исходный текст
Ответ на Re: Querying a time range across multiple partitions  (John R Pierce <pierce@hogranch.com>)
Ответы Re: Querying a time range across multiple partitions  (John R Pierce <pierce@hogranch.com>)
Re: Querying a time range across multiple partitions  (Jeff Janes <jeff.janes@gmail.com>)
Список pgsql-general
John,

> On Fri, Sep 5, 2014 at 2:17 PM, John R Pierce < pierce@hogranch.com >
> wrote:
>> On 9/5/2014 10:31 AM, Cal Heldenbrand wrote:
>> Number of child tables: 1581
>> that's an insane number of children. We try and limit it to 50 or so
>> child tables, for instance, 6 months retention by week, of data will
>> millions of rows/day.
>>
>> I've used more than that many for testing purposes, and there was
>> little problem. The main thing is that your insert trigger (if you
>> have one on the master table) needs to be structured as a binary
>> search-like nesting of if..elsif, not a linear-searching like
>> structure. Unless of course almost all inserts go into the newest
>> partition, then it might make more sense to do the linear search
>> with that being the first test. But for performance, better to just
>> insert directly into the correct child table.
> any select that can't be preplanned to a specific child will need to
> check all 1500 children. this is far less efficient than checking,
> say, 50 and letting the b-tree index of each child reject or narrow
> down to the specific row(s). The one is roughly 1500*log(N/1500)
> while the other is 50*log(N/50) at least to a first order
> approximation.

can you explain that further? In the end, that argument sounds like it would always be more efficient to use a single
tableand its index instead, rather than partitioning it (log(N) < c*log(N/c) for any c > 1, if I'm not totally lost
today).

So, is there any insights of how many partitions are still useful?

I have tables with roughly 1000 partitions and did not have any issues so far. Even with having INSERT rules that are
linedup worst-case (from past to current, while data is always being inserted for the current date), I haven't seen any
considerabledegradation of INSERT performance so far. 

Thanks,
Andreas


В списке pgsql-general по дате отправления:

Предыдущее
От: Jeff Janes
Дата:
Сообщение: Last active time for a database
Следующее
От: John R Pierce
Дата:
Сообщение: Re: Querying a time range across multiple partitions