Обсуждение: Fwd: views much slower in 9.3 than 8.4
There is only one server at this point. The 8.4 machine was upgraded to 9.3 about a year ago and we have no 8.4 backups so it's difficult if not impossible to recreate the 8.4 environment AFAIK. One of our developers pointed out the discrepancy in execution times. I decomposed a slow view and found out that it consists of a view calling a view calling a view (3 deep). This is the analyze explain plan of the innermost view:And as you requested:SELECT version();Postgresql 9.3.4SELECT name, current_setting(name), source
FROM pg_settings
WHERE source NOT IN ('default', 'override');autovacuum,on,configuration filecheckpoint_completion_target,0.9,configuration filecheckpoint_segments,16,configuration fileclient_encoding,UTF8,sessionclient_min_messages,notice,configuration fileDateStyle,"ISO, MDY",configuration filedeadlock_timeout,5s,configuration filedefault_text_search_config,pg_catalog.english,configuration fileeffective_cache_size,8GB,configuration fileeffective_io_concurrency,6,configuration filelc_messages,en_US.UTF-8,configuration filelc_monetary,en_US.UTF-8,configuration filelc_numeric,en_US.UTF-8,configuration filelc_time,en_US.UTF-8,configuration filelisten_addresses,*,configuration filelog_checkpoints,on,configuration filelog_connections,on,configuration filelog_destination,stderr,configuration filelog_directory,/dbms/postgresql/logs/dtfdev,configuration filelog_disconnections,on,configuration filelog_duration,off,configuration filelog_error_verbosity,verbose,configuration filelog_filename,postgresql-%a.log,configuration filelog_hostname,on,configuration filelog_line_prefix,"%t [%p]: [%l-1] db=%d,user=%u ",configuration filelog_lock_waits,on,configuration filelog_min_duration_statement,0,configuration filelog_min_error_statement,error,configuration filelog_min_messages,warning,configuration filelog_rotation_age,1d,configuration filelog_rotation_size,500MB,configuration filelog_statement,none,configuration filelog_temp_files,0,configuration filelog_timezone,US/Pacific,configuration filelog_truncate_on_rotation,on,configuration filelogging_collector,on,configuration filemaintenance_work_mem,256MB,configuration filemax_connections,200,configuration filemax_stack_depth,8MB,configuration fileport,2222,configuration filerandom_page_cost,2,configuration filesearch_path,"acct, ""$user"", public",sessionshared_buffers,4GB,configuration filessl,on,configuration filetemp_buffers,16MB,configuration fileTimeZone,US/Pacific,configuration filewal_level,minimal,configuration filewal_sync_method,fdatasync,configuration filework_mem,5MB,configuration fileserver has 24GB of RAMfrom postgresql.conf:shared_buffers = 4GBeffective_cache_size = 8GBwork_mem = 5MB (note: I increased work_mem to 500MB and repeated the experiment, no difference in exec. time)On Mar 18, 2015, at 10:16 AM, Kevin Grittner <kgrittn@ymail.com> wrote:"Carson, Leonard" <lcarson@sdsc.edu> wrote:While acknowledging that nested loops and sequential table scans
account for 85% of the execution time which suggests that a
better query may be needed, why would the same query run in
seconds on 8.x but take minutes on 9.x?
First, please show the output of this from both servers:
SELECT version();
SELECT name, current_setting(name), source
FROM pg_settings
WHERE source NOT IN ('default', 'override');
Then, for your newer server, please follow the steps outlined here:
https://wiki.postgresql.org/wiki/SlowQueryQuestions
My first guess would be that at some point your costing parameters
were tuned on the old system, but have not yet been tuned on the
new one. Rather than blindly using the old settings for the new
server, it would be good to see the information requested on the
above-cited page to determine good settings for the new server.
--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
"Carson, Leonard" <lcarson@sdsc.edu> writes: > There is only one server at this point. The 8.4 machine was upgraded to 9.3 about a year ago and we have no 8.4 backupsso it's difficult if not impossible to recreate the 8.4 environment AFAIK. One of our developers pointed out thediscrepancy in execution times. I decomposed a slow view and found out that it consists of a view calling a view callinga view (3 deep). This is the analyze explain plan of the innermost view: > http://explain.depesz.com/s/IMg You're probably going to need to show us the actual view definitions. I'm suspicious that the underlying cause might have to do with recent versions being warier about optimizing sub-selects containing volatile functions than 8.4 was. However, that theory doesn't seem to explain the horribly bad join size estimates you're showing. regards, tom lane
Here are the 3 views and some timing notes:
thanks, lcarson
On Mar 18, 2015, at 3:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
"Carson, Leonard" <lcarson@sdsc.edu> writes:There is only one server at this point. The 8.4 machine was upgraded to 9.3 about a year ago and we have no 8.4 backups so it's difficult if not impossible to recreate the 8.4 environment AFAIK. One of our developers pointed out the discrepancy in execution times. I decomposed a slow view and found out that it consists of a view calling a view calling a view (3 deep). This is the analyze explain plan of the innermost view:http://explain.depesz.com/s/IMg
You're probably going to need to show us the actual view definitions.
I'm suspicious that the underlying cause might have to do with recent
versions being warier about optimizing sub-selects containing volatile
functions than 8.4 was. However, that theory doesn't seem to explain
the horribly bad join size estimates you're showing.
regards, tom lane
"Carson, Leonard" <lcarson@sdsc.edu> writes: > Here are the 3 views and some timing notes: > http://pgsql.privatepaste.com/decae31693# That doesn't really leave us any wiser than before, unfortunately. It's clear that the root of the problem is the drastic underestimation of the size of the rq/a join, but it's not clear why that's happening, nor why 8.4 would not have fallen into the same trap. Would it be possible to provide the data in the join columns involved in that part of the query? To wit requests.account_id requests.start_date allocations.account_id allocations.initial_start_date allocations.resource_id regards, tom lane
I wrote: > "Carson, Leonard" <lcarson@sdsc.edu> writes: >> Here are the 3 views and some timing notes: >> http://pgsql.privatepaste.com/decae31693# > That doesn't really leave us any wiser than before, unfortunately. > It's clear that the root of the problem is the drastic underestimation of > the size of the rq/a join, but it's not clear why that's happening, nor > why 8.4 would not have fallen into the same trap. Leonard was kind enough to provide the problematic data off-list, and here's what I find after some poking around: 8.4 is not, in fact, any smarter than the more recent versions, it just happens to get lucky on this particular query. The core of the problem is this aspect of the projv view: SELECT ... FROM allocations a, ... other relations ... WHERE a.initial_start_date = (SELECT max(allocations.initial_start_date) FROM allocations WHERE allocations.account_id = a.account_id AND allocations.resource_id = a.resource_id) AND ... a bunch of other conditions ... (There's a similar consider-only-the-latest-row construction for accounts_history, which doubles the problem, but let's just look at this one for now.) Now there are two things that are bad about this: first, the construct requires executing the MAX-subselect for every row of "a", which is expensive and the planner knows it. Second, it's very very hard for the planner to guess right about the selectivity of this condition on a.initial_start_date. It ends up using DEFAULT_EQ_SEL which is 0.005, but given Leonard's data set the actual selectivity is just about 1.0, ie, there are no records that aren't the latest for their account_id cum resource_id and thus no rows are eliminated by the condition anyway. So we have an expensive scan on "a" that is going to produce many more rows than the planner thinks. By the time we get done joining to accounts_history, which has a similar problem, the planner is estimating only one row out of the join (vs. nearly 24000 in reality), and it's setting the total cost estimate at 148163 cost units. This just totally bollixes planning of the joins to the remaining half-dozen tables. The single-row estimate is nearly fatal in itself, because it encourages nestloop joining which is pretty inappropriate here. But the other problem is that the planner considers less-than-1% differences in cost estimates to be "in the noise", which means that it's not going to consider cost differences of less than 1480 units in the remaining join steps to be significant. This is how come we end up with the apparently brain-dead decisions to use seqscans on some of the other tables such as "pi" and "ac": comparing the seqscan to a potential inner indexscan, the total cost of the join is "the same" according to the 1% rule, and then the first tiebreaker is startup cost, and the indexscan has a higher startup cost. Now, 8.4 also had the 1% rule, but it had slightly different tiebreaking procedures, which caused it to end up picking the inner indexscans over the seqscans; and in this particular data set inner indexscans do far better than repeated seqscans when the rq/a/ah join turns out to produce 24000 times more tuples than predicted. But I can't persuade myself that the tiebreaking changes amount to a bug. (I did experiment with varying the tiebreaking rules a bit, but I think that would just be moving the pain around.) Long-term planner fixes for this type of problem might include improving the statistics enough that we could get better rowcount estimates. (Cross-column stats would help, since a contributing factor is that some of the joins are on two join columns that are pretty heavily correlated.) Another thing we've discussed is making risk estimates, whereby we could realize that the nestloop-plus-seqscan plans are going to be a lot worse if our rowcount estimates are off at all. But both of those things are research projects. What seems like a potential near-term fix for Leonard is to recast his views to do the latest-row selection more intelligently. I experimented with redoing the projv view like this to eliminate the subselects-in-WHERE: SELECT ... FROM acct.requests rq, acct.fields_of_science fos, acct.accounts ac, acct.allocations a, (select account_id, resource_id, max(initial_start_date) AS initial_start_date FROM acct.allocations GROUP BY 1,2) a_latest, acct.transaction_types tt, acct.resources ar, acct.accounts_history ah, (select account_id, resource_id, max(activity_time) AS activity_time FROM acct.accounts_history GROUP BY 1,2) ah_latest, acct.allocation_states sx, acct.principal_investigators pi, acct.people p WHERE a.account_id = ac.account_id AND a.account_id = a_latest.account_id AND a.resource_id = a_latest.resource_id AND a.initial_start_date = a_latest.initial_start_date AND rq.account_id = a.account_id AND rq.start_date = a.initial_start_date AND ar.resource_id = a.resource_id AND a.allocation_type_id = tt.transaction_type_id AND ah.account_id = a.account_id AND ah.resource_id = a.resource_id AND ah.account_id = ah_latest.account_id AND ah.resource_id = ah_latest.resource_id AND ah.activity_time = ah_latest.activity_time AND sx.state_id = ah.state_id AND rq.primary_fos_id = fos.field_of_science_id AND pi.request_id = rq.request_id AND p.person_id = pi.person_id ; That produces significantly better plans. It doesn't look like the rowcount estimates are better :-( ... but the total estimated cost is now down in the range of 3000 or so cost units, which means that the 1% rule doesn't keep us from adopting the inner indexscans. And this is fundamentally a better way to do latest-row selection, anyhow. (I guess another potential research project is to do this sort of aggregated-subselect transformation automatically. But don't hold your breath for that to happen, either.) regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> wrote: > But the other problem is that the planner considers less-than-1% > differences in cost estimates to be "in the noise", which means > that it's not going to consider cost differences of less than > 1480 units in the remaining join steps to be significant. This > is how come we end up with the apparently brain-dead decisions to > use seqscans on some of the other tables such as "pi" and "ac": > comparing the seqscan to a potential inner indexscan, the total > cost of the join is "the same" according to the 1% rule, The 1% rule itself might be something to add to the R&D list. I have seen it cause big problems in production, although the users in that case had made a mistake which significantly contributed to it being an issue. They had used the enable_seqscan = off setting for one query which they had been unable to wrestle into good performance in other ways, but accidentally neglected to turn it back on after that query. Now, seqscans were rarely a good idea with their permanent tables, but they had a couple queries which used very small temporary tables with no indexes. It chose the seqscan in spite of the setting; but, when run with seqscans off, that gave all candidate plans such a high cost that they all looked "equal" and the tie-breaker logic picked a horrible one. (The faster plans did have lower cost, but not by enough to exceed the 1% threshold.) Now, had they not made a questionable choice in disabling seqscan in production, compounded by an error in not turning it off again, they would not have had their main web application slow to unusable levels at times -- but it seems to me that it might be reasonable to have some absolute cost maximum difference test that needs to be met in addition to the percentage difference, as kind of a "safety" on this foot-gun. I'm not sold on this as being a good idea, and had not been planning on raising it without further research; but since it plays into this other scenario it seems worth mentioning as material for potential R&D. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Kevin Grittner <kgrittn@ymail.com> writes: > Tom Lane <tgl@sss.pgh.pa.us> wrote: >> But the other problem is that the planner considers less-than-1% >> differences in cost estimates to be "in the noise", which means >> that it's not going to consider cost differences of less than >> 1480 units in the remaining join steps to be significant. This >> is how come we end up with the apparently brain-dead decisions to >> use seqscans on some of the other tables such as "pi" and "ac": >> comparing the seqscan to a potential inner indexscan, the total >> cost of the join is "the same" according to the 1% rule, > The 1% rule itself might be something to add to the R&D list. Perhaps. But it does make for a significant difference in planner speed, and I would argue that any case where it really hurts is by definition a cost estimation failure somewhere else. > [ disable_cost skews the behavior pretty badly ] True. Your example suggests that it might be nice to have something other than a cost-delta way of discriminating against seqscans. In principle this consideration could be added to add_path(), although I'm pretty hesitant to make that function even more complex/slower. Perhaps another way would be to generate seqscan paths last (I think they're first at the moment), and only generate them if we didn't find any other path for the rel. Nestloops for join rels have the same issue and would need to be handled similarly, whatever solution we pick. regards, tom lane
On 3/30/15 10:52 AM, Tom Lane wrote: > Kevin Grittner<kgrittn@ymail.com> writes: >> >Tom Lane<tgl@sss.pgh.pa.us> wrote: >>> >>But the other problem is that the planner considers less-than-1% >>> >>differences in cost estimates to be "in the noise", which means >>> >>that it's not going to consider cost differences of less than >>> >>1480 units in the remaining join steps to be significant. This >>> >>is how come we end up with the apparently brain-dead decisions to >>> >>use seqscans on some of the other tables such as "pi" and "ac": >>> >>comparing the seqscan to a potential inner indexscan, the total >>> >>cost of the join is "the same" according to the 1% rule, >> >The 1% rule itself might be something to add to the R&D list. > Perhaps. But it does make for a significant difference in planner speed, > and I would argue that any case where it really hurts is by definition > a cost estimation failure somewhere else. What I wish we had was some way to represent "confidence" in the accuracy of a specific plan node, with the goal of avoiding plans that cost out slightly cheaper but if we guessed wrong on something will blow up spectacularly. Nested loops are an example; if you miscalculate either of the sides by very much you can end up with a real mess unless the rowcounts were already pretty trivial to begin with. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com