Обсуждение: BUG #5652: Optimizer does wrong thing with partitioned tables
The following bug has been logged online: Bug reference: 5652 Logged by: Mladen Gogala Email address: mladen.gogala@vmsinfo.com PostgreSQL version: 8.4.4 Operating system: Red Hat Linux 5.5, 64b Description: Optimizer does wrong thing with partitioned tables Details: Optimizer chooses to scan each partitioned table sequentially, instead of using the available index: news=# explain select max(created_at) from moreover_documents; QUERY PLAN ---------------------------------------------------------------------------- ---- --------------------------------------- Aggregate (cost=5115432.65..5115432.66 rows=1 width=8) -> Append (cost=0.00..5017318.72 rows=39245572 width=8) -> Seq Scan on moreover_documents (cost=0.00..10.20 rows=20 width=8) -> Seq Scan on moreover_documents_y2010m06 moreover_documents (cost=0 .00..236550.85 rows=1859585 width=8) -> Seq Scan on moreover_documents_y2010m07 moreover_documents (cost=0 .00..2073604.38 rows=16276938 width=8) -> Seq Scan on moreover_documents_y2010m08 moreover_documents (cost=0 .00..2022494.13 rows=15670513 width=8) -> Seq Scan on moreover_documents_y2010m09 moreover_documents (cost=0 .00..684618.36 rows=5438436 width=8) -> Seq Scan on moreover_documents_y2010m10 moreover_documents (cost=0 .00..10.20 rows=20 width=8) -> Seq Scan on moreover_documents_y2010m11 moreover_documents (cost=0 .00..10.20 rows=20 width=8) -> Seq Scan on moreover_documents_y2010m12 moreover_documents (cost=0 .00..10.20 rows=20 width=8) -> Seq Scan on moreover_documents_y2011m01 moreover_documents (cost=0 .00..10.20 rows=20 width=8) When on single partition, the optimizer does the right thing: news=# explain select max(created_at) from moreover_documents_y2010m09; QUERY PLAN ---------------------------------------------------------------------------- ---- ------------------------------------------------------------ Result (cost=0.15..0.16 rows=1 width=0) InitPlan 1 (returns $0) -> Limit (cost=0.00..0.15 rows=1 width=8) -> Index Scan Backward using mdocs_created_y2010m09 on moreover_docu ments_y2010m09 (cost=0.00..800757.60 rows=5438436 width=8) Filter: (created_at IS NOT NULL) (5 rows) Index is available on each and every partition. There is only one way to calculate the max for the entire partitioned table: to calculate max for each partition and then select max of those. Optimizer should take into account the available access paths for each partition, yet it doesn't do so.
Re: BUG #5652: Optimizer does wrong thing with partitioned tables
От
Euler Taveira de Oliveira
Дата:
Mladen Gogala escreveu: > Optimizer chooses to scan each partitioned table sequentially, instead of > using the available index: > This is not a bug. How would the optimizer know that the maximum value is in that specific partition? There is neither a global index for a partitioned table nor an optimizer artifact to know aggregate information before scanning all of the partitions. Maybe when we have a better support for table partitioning such optimizer artifact would be implemented but don't hold your breath. -- Euler Taveira de Oliveira http://www.timbira.com/
Euler Taveira de Oliveira wrote: > Mladen Gogala escreveu: > >> Optimizer chooses to scan each partitioned table sequentially, instead of >> using the available index: >> >> > This is not a bug. How would the optimizer know that the maximum value is in > that specific partition? There is neither a global index for a partitioned > table nor an optimizer artifact to know aggregate information before scanning > all of the partitions. Maybe when we have a better support for table > partitioning such optimizer artifact would be implemented but don't hold your > breath. > > > Euler, optimizer is selecting a wrong path, which is a bug by definition. I am not asking the optimizer to know what is the maximum, I am asking it to select the right path, something that it isn't doing. As is visible from the original bug report, the optimizer is choosing sequential scan instead of the index scan. When only a single partition is involved, the index scan is selected, while for the entire table, sequential scan is selected for every partition. -- Mladen Gogala Sr. Oracle DBA 1500 Broadway New York, NY 10036 (212) 329-5251 http://www.vmsinfo.com The Leader in Integrated Media Intelligence Solutions
Just adding my voice to the "fix it" camp. Is there any reason the table scans in this sort of thing cannot be independently planned? Best Wishes, Chris Travers
On Fri, 2010-09-10 at 10:37 -0400, Mladen Gogala wrote: > Euler, optimizer is selecting a wrong path, which is a bug by > definition. I agree that the optimizer should be improved here, but it's not really a "bug". I think what you are requesting is considered more of a feature to make the optimizer smarter. The optimizer will never be so good that it always picks the best path. Consider a 100-table join: would it be a "bug" if it didn't find the perfect join order? Regards, Jeff Davis
On Fri, 2010-09-10 at 08:10 -0700, Chris Travers wrote: > Just adding my voice to the "fix it" camp. Is there any reason the > table scans in this sort of thing cannot be independently planned? I don't think it's about independent planning. For instance, AVG clearly can't be planned this way, there are particular properties of MAX that allow the optimization: 1. MAX(x) can be rewritten as: ORDER BY x DESC LIMIT 1 2. The MAX of set S is the MAX of the MAXes of each partition of S The optimizer knows about the former, but not the latter. Regards, Jeff Davis
Jeff Davis wrote: > On Fri, 2010-09-10 at 08:10 -0700, Chris Travers wrote: > >> Just adding my voice to the "fix it" camp. Is there any reason the >> table scans in this sort of thing cannot be independently planned? >> > > I don't think it's about independent planning. For instance, AVG clearly > can't be planned this way, there are particular properties of MAX that > allow the optimization: > > 1. MAX(x) can be rewritten as: ORDER BY x DESC LIMIT 1 > 2. The MAX of set S is the MAX of the MAXes of each partition of S > > The optimizer knows about the former, but not the latter. > > Regards, > Jeff Davis > > Jeff, that's the problem. Functions like "MAX" are rather ordinary and frequently used. Using sequential scan to read all partitions is the wrong thing to do. I agree that AVG() cannot be computed using index but MAX() and MIN() can. I will send you personally 2 versions of a script that I am still writing, just to see to what extent do I go to get the necessary performance. Optimizer definitely needs fixes when it comes to partitions. -- Mladen Gogala Sr. Oracle DBA 1500 Broadway New York, NY 10036 (212) 329-5251 http://www.vmsinfo.com The Leader in Integrated Media Intelligence Solutions
On Fri, 2010-09-10 at 16:53 -0400, Mladen Gogala wrote: > Jeff, that's the problem. Functions like "MAX" are rather ordinary and > frequently used. I agree that it could be improved. The best way to move such improvement forward is to advance the discussion on -hackers. Start with this thread here: http://archives.postgresql.org/pgsql-hackers/2009-07/msg01115.php and see if you have anything to add, or see if you can find related discussions. The main point is that -bugs is usually for short discussions about fixing defects that were just discovered, and -hackers is better for problems without an obvious solution, where discussion is required. Regards, Jeff Davis
On Fri, Sep 10, 2010 at 1:53 PM, Mladen Gogala <mgogala@vmsinfo.com> wrote: > Jeff, that's the problem. Functions like "MAX" are rather ordinary and > frequently used. Using sequential scan to read all partitions is the wrong > thing to do. I agree that AVG() cannot be computed using index but MAX() and > MIN() can. I will send you personally 2 versions of a script that I am still > writing, just to see to what extent do I go to get the necessary > performance. Optimizer definitely needs fixes when it comes to partitions. > IIRC, the planner already has been tweaked to allow index scans on MAX for single tables. This of course did not happen within a stable branch. The question over whether this is a "bug" or a "feature" depends to a large extent on how one defines a bug. I would be inclined to call this a "bug" for discussion purposes since it causes the planner to make plan choices that are well known to be problematic in these cases, but it I would not be in favor of correcting this in a stable branch. My reading of the change log is that it is rare that changes to long-standing behavior in general, and particularly for the optimizer, occur within a stable branch. Given that this is long-standing behavior, I think it is worth accepting that it is not a "bug" we might want fixed within a stable release. I agree with the suggestion that a discussion start on -hackers. I still think it is a problem that should be fixed. Just not in a stable branch, esp. because this has a reasonable workaround (changing to an order by... limit 1). I guess what I am trying to suggest here is that "bug" and "feature" are not distinct categories which have no overlap. Where software, like an RDBMS, is mission-critical, I think it is a good practice to do what the Pg developers do and avoid making unnecessary changes within a stable release. This means that some "bugs" should be treated as "features" where the behavior is longstanding, a workaround is possible, and the fix likely to involve changes to important components. Best Wishes, Chris Travers
Jeff Davis wrote: > On Fri, 2010-09-10 at 16:53 -0400, Mladen Gogala wrote: > >> Jeff, that's the problem. Functions like "MAX" are rather ordinary and >> frequently used. >> > > I agree that it could be improved. The best way to move such improvement > forward is to advance the discussion on -hackers. Start with this thread > here: > > http://archives.postgresql.org/pgsql-hackers/2009-07/msg01115.php > > and see if you have anything to add, or see if you can find related > discussions. The main point is that -bugs is usually for short > discussions about fixing defects that were just discovered, and -hackers > is better for problems without an obvious solution, where discussion is > required. > > Regards, > Jeff Davis > > Thanks, I will do that. However, I cannot call myself a hacker, I am but a mere beginner with Postgres. -- Mladen Gogala Sr. Oracle DBA 1500 Broadway New York, NY 10036 (212) 329-5251 www.vmsinfo.com
On Fri, Sep 10, 2010 at 11:34:16PM -0400, Mladen Gogala wrote: > Jeff Davis wrote: > >On Fri, 2010-09-10 at 16:53 -0400, Mladen Gogala wrote: > >>Jeff, that's the problem. Functions like "MAX" are rather ordinary > >>and frequently used. > > > >I agree that it could be improved. The best way to move such > >improvement forward is to advance the discussion on -hackers. Start > >with this thread here: > > > > http://archives.postgresql.org/pgsql-hackers/2009-07/msg01115.php > > > >and see if you have anything to add, or see if you can find related > >discussions. The main point is that -bugs is usually for short > >discussions about fixing defects that were just discovered, and > >-hackers is better for problems without an obvious solution, where > >discussion is required. > > > Thanks, I will do that. However, I cannot call myself a hacker, I > am but a mere beginner with Postgres. You're a hacker when you start hacking :) Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On Fri, Sep 10, 2010 at 9:31 AM, Euler Taveira de Oliveira <euler@timbira.com> wrote: > Mladen Gogala escreveu: >> Optimizer chooses to scan each partitioned table sequentially, instead of >> using the available index: >> > This is not a bug. How would the optimizer know that the maximum value is in > that specific partition? There is neither a global index for a partitioned > table nor an optimizer artifact to know aggregate information before scanning > all of the partitions. Maybe when we have a better support for table > partitioning such optimizer artifact would be implemented but don't hold your > breath. I wonder if Merge Append could be made to help with this case. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company