Обсуждение: Problem with query plan
I have come up with a simple query that runs horribly depending on the
number of columns selected.
select order_lines.*
from orders, order_lines
where orders.merchant_order_id = '11343445' and
order_lines.order_id=orders.order_id;
merchant_order_id is indexed.
order_id is indexed.
Tables are analyzed.
I get the following plan:
---------------------------------------------------------
Merge Join (cost=nan..nan rows=3 width=1257)
Merge Cond: ("outer".order_id = "inner".order_id)
-> Sort (cost=5.33..5.33 rows=2 width=4)
Sort Key: orders.order_id
-> Index Scan using ak_po_number on orders (cost=0.00..5.32
rows=2 width=4)
Index Cond: ((merchant_order_id)::text =
'11343445'::text)
-> Sort (cost=nan..nan rows=2023865 width=1257)
Sort Key: order_lines.order_id
-> Seq Scan on order_lines (cost=0.00..83822.65 rows=2023865
width=1257)
If I restrict the columns (i.e., select 1 from ...), it works great.
I can add columns and it seems that once I get a width of more than
~610, it executes a Merge Join of cost nan that takes forever to return.
If I reduce the columns returned to slightly below this, I get a much
nicer plan:
----------------------------------------------------------
Nested Loop (cost=0.00..16.60 rows=4 width=606)
-> Index Scan using ak_po_number on orders (cost=0.00..5.69 rows=3
width=4)
Index Cond: ((merchant_order_id)::text = '11343445'::text)
-> Index Scan using ak_order_line_doid on order_lines
(cost=0.00..3.61 rows=2 width=610)
Index Cond: (order_lines.order_id = "outer".order_id)
Is this possibly just an overflow that causes a NaN that isn't properly
handled by the optimizer?
This is on Redhat 3.0 AS U3 x86 with the RPMs from postgresql.org.
Thanks!
Cott Lang <cott@internetstaff.com> writes:
> -> Sort (cost=nan..nan rows=2023865 width=1257)
What PG version is this? My recollection is we fixed such a thing quite
some time ago ...
regards, tom lane
Oops, sorry - guess I left that out - 7.4.5. :) On Fri, 2004-10-22 at 12:28, Tom Lane wrote: > Cott Lang <cott@internetstaff.com> writes: > > -> Sort (cost=nan..nan rows=2023865 width=1257) > > What PG version is this? My recollection is we fixed such a thing quite > some time ago ... > > regards, tom lane
Cott Lang <cott@internetstaff.com> writes:
> Oops, sorry - guess I left that out - 7.4.5. :)
Hmm ... I can't duplicate any misbehavior here. Are you using
nondefault values for any planner parameters? (particularly sort_mem,
random_page_cost, effective_cache_size)
regards, tom lane
shared_buffers = 16384 sort_mem = 8192 random_page_cost = 2 effective_cache_size = 3932160 On Fri, 2004-10-22 at 13:32, Tom Lane wrote: > Cott Lang <cott@internetstaff.com> writes: > > Oops, sorry - guess I left that out - 7.4.5. :) > > Hmm ... I can't duplicate any misbehavior here. Are you using > nondefault values for any planner parameters? (particularly sort_mem, > random_page_cost, effective_cache_size) > > regards, tom lane
Cott Lang <cott@internetstaff.com> writes:
> sort_mem = 8192
> random_page_cost = 2
> effective_cache_size = 3932160
effective_cache_size 30Gb ? Seems a tad high ;-)
However, I set up a dummy test case on 7.4.5 and don't see any overflow.
regression=# create table z1(f1 char(1253));
CREATE TABLE
regression=# update pg_class set reltuples=2023865, relpages=65000 where relname = 'z1';
UPDATE 1
regression=# set sort_mem = 8192;
SET
regression=# set random_page_cost = 2;
SET
regression=# set effective_cache_size = 3932160;
SET
regression=# explain select * from z1 order by f1;
QUERY PLAN
---------------------------------------------------------------------
Sort (cost=2200533.17..2205592.83 rows=2023865 width=1257)
Sort Key: f1
-> Seq Scan on z1 (cost=0.00..85238.65 rows=2023865 width=1257)
(3 rows)
Can you try this exact test case and see if you get a NAN?
regards, tom lane
On Fri, 2004-10-22 at 14:19, Tom Lane wrote: > Cott Lang <cott@internetstaff.com> writes: > > sort_mem = 8192 > > random_page_cost = 2 > > effective_cache_size = 3932160 > > effective_cache_size 30Gb ? Seems a tad high ;-) It's a 32GB machine with nothing else running on it except PG, buffers hover around 31GB :) > However, I set up a dummy test case on 7.4.5 and don't see any overflow. > Can you try this exact test case and see if you get a NAN? I don't. After a bounce, I also can't repeat my original case; it now returns 16.60. Fiddling with the above values, only setting sort_mem absurdly large easily causes NAN. My guess is there was a wonky setting for sort_mem that disappeared after I bounced.
Cott Lang <cott@internetstaff.com> writes:
> Fiddling with the above values, only setting sort_mem absurdly large
> easily causes NAN.
Ah. I see an overflow case for sort_mem exceeding 1Gb; that's probably
what you tickled.
I've fixed this in HEAD, but it doesn't seem worth back-patching.
If you care, the change in HEAD is
*** src/backend/optimizer/path/costsize.c.orig Sun Aug 29 01:06:43 2004
--- src/backend/optimizer/path/costsize.c Fri Oct 22 20:02:39 2004
***************
*** 566,572 ****
if (nbytes > work_mem_bytes)
{
double npages = ceil(nbytes / BLCKSZ);
! double nruns = nbytes / (work_mem_bytes * 2);
double log_runs = ceil(LOG6(nruns));
double npageaccesses;
--- 566,572 ----
if (nbytes > work_mem_bytes)
{
double npages = ceil(nbytes / BLCKSZ);
! double nruns = (nbytes / work_mem_bytes) * 0.5;
double log_runs = ceil(LOG6(nruns));
double npageaccesses;
but the variable names have changed since 7.4 so this won't apply
cleanly.
regards, tom lane
Tom Lane wrote:
> Cott Lang <cott@internetstaff.com> writes:
>
>>Fiddling with the above values, only setting sort_mem absurdly large
>>easily causes NAN.
>
>
> Ah. I see an overflow case for sort_mem exceeding 1Gb; that's probably
> what you tickled.
>
> I've fixed this in HEAD, but it doesn't seem worth back-patching.
> If you care, the change in HEAD is
>
> *** src/backend/optimizer/path/costsize.c.orig Sun Aug 29 01:06:43 2004
> --- src/backend/optimizer/path/costsize.c Fri Oct 22 20:02:39 2004
> ***************
> *** 566,572 ****
> if (nbytes > work_mem_bytes)
> {
> double npages = ceil(nbytes / BLCKSZ);
> ! double nruns = nbytes / (work_mem_bytes * 2);
> double log_runs = ceil(LOG6(nruns));
> double npageaccesses;
>
> --- 566,572 ----
> if (nbytes > work_mem_bytes)
> {
> double npages = ceil(nbytes / BLCKSZ);
> ! double nruns = (nbytes / work_mem_bytes) * 0.5;
> double log_runs = ceil(LOG6(nruns));
> double npageaccesses;
>
>
> but the variable names have changed since 7.4 so this won't apply
> cleanly.
If somebody care about apply this for 7.4, here there is the equivalent change:
--- costsize.c.orig 2004-10-23 11:17:38.000000000 +0200
+++ costsize.c 2004-10-23 11:19:04.000000000 +0200
@@ -548,7 +548,7 @@
if (nbytes > sortmembytes)
{
double npages = ceil(nbytes / BLCKSZ);
- double nruns = nbytes / (sortmembytes * 2);
+ double nruns = ( nbytes / sortmembytes ) * 0.5 ;
double log_runs = ceil(LOG6(nruns));
double npageaccesses;
Regards
Gaetano Mendola