Обсуждение: [GENERAL] Partitioning and Table Inheritance
I'm working on a problem where partitioning seems to be the right approach, but we would need a lot of partitions (say 10k or 100k). Everywhere I read that after ~100 child tables you experience problems. I have a few questions about that: 1. Is it true that the only disadvantage to 10k children is query planning time when hitting the parent table? Is there anything else to worry about? (Unlike ext3, ext4 seems to have no problem with lots of files in one directory (e.g. 1,000,000). It doesn't break and it even performs fine.) I know a lot of tables increases memory use. There is clutter too in \dt but I think I can use schemas to cut down on that. Anything else? 2. Is there any reason I shouldn't bypass the query planning cost by having the app just SELECT from the correct child table (which is easy for us 99% of the time)? Similarly I could bypass trigger time by INSERTing directly into the child, etc. 3. Is it true that query planning time should scale linearly as I add more child tables? I'm seeing ~16ms planning time from EXPLAIN ANALYZE with 1000 tables. I haven't tried 10k tables yet, but so far 16ms or even 160ms seems tolerable if it's only for the 1% of queries that can't SELECT directly from a known child table. 4. I tried a scheme where instead of one parent table with 1000 direct children, I have 1 parent with 10 children, and they each have 10 children, and they each have 10 children. I thought by using increasingly-specific constraints I could maybe make query planning scale by O(log n) instead of O(n), but instead it went up! Upon investigating it looks like maybe the planner flattens the inheritance hierarchy before doing constraint_exclusion work. Is that true? If so, is there any interest in changing this in future versions? (I might be willing to take a stab at a patch for it.) Btw this is on 9.4 but it could be 9.5 or 9.6 if that would help any. Thanks, Paul
On Mon, May 08, 2017 at 10:12:18AM -0700, Paul A Jungwirth wrote: > I'm working on a problem where partitioning seems to be the right > approach, but we would need a lot of partitions (say 10k or 100k). > Everywhere I read that after ~100 child tables you experience > problems. I have a few questions about that: We use partitioning, previously one child per month (with history of 1-6 years); I tried using one child per day, and caused issues. For us, planning time is pretty unimportant (~1sec would be acceptable 99% of the time) but I recall seeing even more than that. I changed to using daily granularity for only our largest tables, which seems to be working fine for the last ~9months. So the issue isn't just "number of children" but "total number of tables". I believe the problem may have been due to large pg_statistic/pg_attribute and similar tables taking more than a few 100MBs, and potentially no longer fitting in buffer cache. > 3. Is it true that query planning time should scale linearly as I add > more child tables? I believe it's understood to be super-linear: https://www.postgresql.org/message-id/26761.1483989025%40sss.pgh.pa.us https://www.postgresql.org/message-id/4188.1298960419%40sss.pgh.pa.us Justin
Hi 12.05.2017 23:22, Justin Pryzby пишет: > On Mon, May 08, 2017 at 10:12:18AM -0700, Paul A Jungwirth wrote: >> I'm working on a problem where partitioning seems to be the right >> approach, but we would need a lot of partitions (say 10k or 100k). >> Everywhere I read that after ~100 child tables you experience >> problems. I have a few questions about that: > We use partitioning, previously one child per month (with history of 1-6 > years); I tried using one child per day, and caused issues. > > For us, planning time is pretty unimportant (~1sec would be acceptable 99% of > the time) but I recall seeing even more than that. I changed to using daily > granularity for only our largest tables, which seems to be working fine for the > last ~9months. So the issue isn't just "number of children" but "total number > of tables". I believe the problem may have been due to large > pg_statistic/pg_attribute and similar tables taking more than a few 100MBs, and > potentially no longer fitting in buffer cache. > >> 3. Is it true that query planning time should scale linearly as I add >> more child tables? > I believe it's understood to be super-linear: > https://www.postgresql.org/message-id/26761.1483989025%40sss.pgh.pa.us > https://www.postgresql.org/message-id/4188.1298960419%40sss.pgh.pa.us There is an extension called pg_pathman which seriously optimizes the table partitioning, it might help in your case: https://github.com/postgrespro/pg_pathman See also: https://www.postgresql.org/message-id/4188.1298960419%40sss.pgh.pa.us and https://postgrespro.com/blog/pgsql/pg_pathman_e > > Justin > > Ivan
Hi Paul,
How much of your data is time-series in nature? Put another way, is there a timestamp coupled with the inserted data?
Andrew
On Fri, May 12, 2017 at 4:38 PM, Ivan E. Panchenko <i.panchenko@postgrespro.ru> wrote:
Hi
12.05.2017 23:22, Justin Pryzby пишет:On Mon, May 08, 2017 at 10:12:18AM -0700, Paul A Jungwirth wrote:There is an extension called pg_pathman which seriously optimizes the table partitioning, it might help in your case:I'm working on a problem where partitioning seems to be the rightWe use partitioning, previously one child per month (with history of 1-6
approach, but we would need a lot of partitions (say 10k or 100k).
Everywhere I read that after ~100 child tables you experience
problems. I have a few questions about that:
years); I tried using one child per day, and caused issues.
For us, planning time is pretty unimportant (~1sec would be acceptable 99% of
the time) but I recall seeing even more than that. I changed to using daily
granularity for only our largest tables, which seems to be working fine for the
last ~9months. So the issue isn't just "number of children" but "total number
of tables". I believe the problem may have been due to large
pg_statistic/pg_attribute and similar tables taking more than a few 100MBs, and
potentially no longer fitting in buffer cache.3. Is it true that query planning time should scale linearly as I addI believe it's understood to be super-linear:
more child tables?
https://www.postgresql.org/message-id/26761.1483989025%40sss .pgh.pa.us
https://www.postgresql.org/message-id/4188.1298960419%40sss. pgh.pa.us
https://github.com/postgrespro/pg_pathman
See also:
https://www.postgresql.org/message-id/4188.1298960419%40sss. pgh.pa.us and
https://postgrespro.com/blog/pgsql/pg_pathman_e Ivan
Justin
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
TimescaleDB | Growth & Developer Evangelism
c: 908.581.9509
335 Madison Ave.
New York, NY 10017