Обсуждение: [GENERAL] Partitioning and Table Inheritance

Поиск
Список
Период
Сортировка

[GENERAL] Partitioning and Table Inheritance

От
Paul A Jungwirth
Дата:
I'm working on a problem where partitioning seems to be the right
approach, but we would need a lot of partitions (say 10k or 100k).
Everywhere I read that after ~100 child tables you experience
problems. I have a few questions about that:

1. Is it true that the only disadvantage to 10k children is query
planning time when hitting the parent table? Is there anything else to
worry about? (Unlike ext3, ext4 seems to have no problem with lots of
files in one directory (e.g. 1,000,000). It doesn't break and it even
performs fine.) I know a lot of tables increases memory use. There is
clutter too in \dt but I think I can use schemas to cut down on that.
Anything else?

2. Is there any reason I shouldn't bypass the query planning cost by
having the app just SELECT from the correct child table (which is easy
for us 99% of the time)? Similarly I could bypass trigger time by
INSERTing directly into the child, etc.

3. Is it true that query planning time should scale linearly as I add
more child tables? I'm seeing ~16ms planning time from EXPLAIN ANALYZE
with 1000 tables. I haven't tried 10k tables yet, but so far 16ms or
even 160ms seems tolerable if it's only for the 1% of queries that
can't SELECT directly from a known child table.

4. I tried a scheme where instead of one parent table with 1000 direct
children, I have 1 parent with 10 children, and they each have 10
children, and they each have 10 children. I thought by using
increasingly-specific constraints I could maybe make query planning
scale by O(log n) instead of O(n), but instead it went up! Upon
investigating it looks like maybe the planner flattens the inheritance
hierarchy before doing constraint_exclusion work. Is that true? If so,
is there any interest in changing this in future versions? (I might be
willing to take a stab at a patch for it.)

Btw this is on 9.4 but it could be 9.5 or 9.6 if that would help any.

Thanks,
Paul


Re: [GENERAL] Partitioning and Table Inheritance

От
Justin Pryzby
Дата:
On Mon, May 08, 2017 at 10:12:18AM -0700, Paul A Jungwirth wrote:
> I'm working on a problem where partitioning seems to be the right
> approach, but we would need a lot of partitions (say 10k or 100k).
> Everywhere I read that after ~100 child tables you experience
> problems. I have a few questions about that:

We use partitioning, previously one child per month (with history of 1-6
years); I tried using one child per day, and caused issues.

For us, planning time is pretty unimportant (~1sec would be acceptable 99% of
the time) but I recall seeing even more than that.  I changed to using daily
granularity for only our largest tables, which seems to be working fine for the
last ~9months.  So the issue isn't just "number of children" but "total number
of tables".  I believe the problem may have been due to large
pg_statistic/pg_attribute and similar tables taking more than a few 100MBs, and
potentially no longer fitting in buffer cache.

> 3. Is it true that query planning time should scale linearly as I add
> more child tables?

I believe it's understood to be super-linear:
https://www.postgresql.org/message-id/26761.1483989025%40sss.pgh.pa.us
https://www.postgresql.org/message-id/4188.1298960419%40sss.pgh.pa.us

Justin


Re: [GENERAL] Partitioning and Table Inheritance

От
"Ivan E. Panchenko"
Дата:
Hi


12.05.2017 23:22, Justin Pryzby пишет:
> On Mon, May 08, 2017 at 10:12:18AM -0700, Paul A Jungwirth wrote:
>> I'm working on a problem where partitioning seems to be the right
>> approach, but we would need a lot of partitions (say 10k or 100k).
>> Everywhere I read that after ~100 child tables you experience
>> problems. I have a few questions about that:
> We use partitioning, previously one child per month (with history of 1-6
> years); I tried using one child per day, and caused issues.
>
> For us, planning time is pretty unimportant (~1sec would be acceptable 99% of
> the time) but I recall seeing even more than that.  I changed to using daily
> granularity for only our largest tables, which seems to be working fine for the
> last ~9months.  So the issue isn't just "number of children" but "total number
> of tables".  I believe the problem may have been due to large
> pg_statistic/pg_attribute and similar tables taking more than a few 100MBs, and
> potentially no longer fitting in buffer cache.
>
>> 3. Is it true that query planning time should scale linearly as I add
>> more child tables?
> I believe it's understood to be super-linear:
> https://www.postgresql.org/message-id/26761.1483989025%40sss.pgh.pa.us
> https://www.postgresql.org/message-id/4188.1298960419%40sss.pgh.pa.us
There is an extension called pg_pathman which seriously optimizes the
table partitioning, it might help in your case:

https://github.com/postgrespro/pg_pathman

See also:

https://www.postgresql.org/message-id/4188.1298960419%40sss.pgh.pa.us and

https://postgrespro.com/blog/pgsql/pg_pathman_e
>
> Justin
>
>
Ivan


Re: [GENERAL] Partitioning and Table Inheritance

От
Andrew Staller
Дата:
Hi Paul,

How much of your data is time-series in nature? Put another way, is there a timestamp coupled with the inserted data?

Andrew

On Fri, May 12, 2017 at 4:38 PM, Ivan E. Panchenko <i.panchenko@postgrespro.ru> wrote:
Hi


12.05.2017 23:22, Justin Pryzby пишет:
On Mon, May 08, 2017 at 10:12:18AM -0700, Paul A Jungwirth wrote:
I'm working on a problem where partitioning seems to be the right
approach, but we would need a lot of partitions (say 10k or 100k).
Everywhere I read that after ~100 child tables you experience
problems. I have a few questions about that:
We use partitioning, previously one child per month (with history of 1-6
years); I tried using one child per day, and caused issues.

For us, planning time is pretty unimportant (~1sec would be acceptable 99% of
the time) but I recall seeing even more than that.  I changed to using daily
granularity for only our largest tables, which seems to be working fine for the
last ~9months.  So the issue isn't just "number of children" but "total number
of tables".  I believe the problem may have been due to large
pg_statistic/pg_attribute and similar tables taking more than a few 100MBs, and
potentially no longer fitting in buffer cache.

3. Is it true that query planning time should scale linearly as I add
more child tables?
I believe it's understood to be super-linear:
https://www.postgresql.org/message-id/26761.1483989025%40sss.pgh.pa.us
https://www.postgresql.org/message-id/4188.1298960419%40sss.pgh.pa.us
There is an extension called pg_pathman which seriously optimizes the table partitioning, it might help in your case:

https://github.com/postgrespro/pg_pathman

See also:

https://www.postgresql.org/message-id/4188.1298960419%40sss.pgh.pa.us and

https://postgrespro.com/blog/pgsql/pg_pathman_e

Justin


Ivan



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



--
TimescaleDB | Growth & Developer Evangelism
c: 908.581.9509

335 Madison Ave. 
New York, NY 10017