Обсуждение: Odd performance issue

Поиск
Список
Период
Сортировка

Odd performance issue

От
Bart Grantham
Дата:
Summary: depending on the value, the planner will sometimes choose a seq
scan, sometimes an index scan.  The former produces terrible performace,
the latter great performance.

The long story: we had a disk failure (NOT the disk the db was on) and
the machine's system disk had to be rebuilt from the raid array and
re-GRUB'ed.  Now that the the system is back up we are seeing terrible
performance (or more accurately, wildly varying performance).  I've
tried re-importing the data from the live system (this is new hardware
under testing for the system) and re-initing the db cluster.  A specific
example is probably best.  This 'connections' table has about 922K
rows.  The difference here is node_id's 28542 vs. 28560.  Using 28542
causes an index scan, 28560 causes a seq scan:

The details:
logicops2=> explain SELECT * from connections AS c LEFT JOIN
connection_types AS ct ON ct.connection_type_id = c.connection_type_id
WHERE ( connector_node_id = 28542 OR connectee_node_id = 28542 ) AND (
c.connection_type_id < 1000 ) LIMIT 300;
                                                          QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=1.29..563.05 rows=203 width=116)
   ->  Hash Left Join  (cost=1.29..563.05 rows=203 width=116)
         Hash Cond: ("outer".connection_type_id =
"inner".connection_type_id)
         ->  Index Scan using c_connector_node_id, c_connectee_node_id
on connections c  (cost=0.00..558.72 rows=203 width=33)
               Index Cond: ((connector_node_id = 28542) OR
(connectee_node_id = 28542))
               Filter: (connection_type_id < 1000)
         ->  Hash  (cost=1.23..1.23 rows=23 width=83)
               ->  Seq Scan on connection_types ct  (cost=0.00..1.23
rows=23 width=83)
(8 rows)

Time: 0.935 ms
logicops2=> SELECT * from connections AS c LEFT JOIN connection_types AS
ct ON ct.connection_type_id = c.connection_type_id WHERE (
connector_node_id = 28542 OR connectee_node_id = 28542 ) AND (
c.connection_type_id < 1000 ) LIMIT 300;
...results...
(12 rows)

Time: 1.887 ms

-vs-

logicops2=> explain SELECT * from connections AS c LEFT JOIN
connection_types AS ct ON ct.connection_type_id = c.connection_type_id
WHERE ( connector_node_id = 28560 OR connectee_node_id = 28560 ) AND (
c.connection_type_id < 1000 ) LIMIT 300;
                                                      QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
 Limit  (cost=1.29..686.09 rows=300 width=116)
   ->  Hash Left Join  (cost=1.29..24939.39 rows=10925 width=116)
         Hash Cond: ("outer".connection_type_id =
"inner".connection_type_id)
         ->  Seq Scan on connections c  (cost=0.00..24774.23 rows=10925
width=33)
               Filter: (((connector_node_id = 28560) OR
(connectee_node_id = 28560)) AND (connection_type_id < 1000))
         ->  Hash  (cost=1.23..1.23 rows=23 width=83)
               ->  Seq Scan on connection_types ct  (cost=0.00..1.23
rows=23 width=83)
(7 rows)

Time: 0.704 ms
logicops2=> SELECT * from connections AS c LEFT JOIN connection_types AS
ct ON ct.connection_type_id = c.connection_type_id WHERE (
connector_node_id = 28560 OR connectee_node_id = 28560 ) AND (
c.connection_type_id < 1000 ) LIMIT 300;
...results...
(7 rows)

Time: 578.597 ms


... it may be relevant that one node_id has 15 times as many connections:

logicops2=> select count(*) from connections where connector_node_id =
28542 OR connectee_node_id = 28542;
 count
-------
   856
(1 row)

Time: 1.424 ms
logicops2=> select count(*) from connections where connector_node_id =
28560 OR connectee_node_id = 28560;
 count
-------
 13500
(1 row)

Time: 559.696 ms

... but that shouldn't make a difference to the planner, should it?
Yes, I've vacuum analyzed.


Also, I was wondering if someone could correct me on a bit of array
syntax.  I'd like to have a query pass back an array of ints to a
function call.  Something like this:
logicops2=> select * from nodes2ancestors(array[(select node_id from
nodes where node_type_id = 3)]::int[], 0);
ERROR:  more than one row returned by a subquery used as an expression

Thanks for any help/pointers you guys can provide.  I really appreciate
it as I'm down to the wire on a project and this performance thing has
really blindsided us.

Bart

Re: Odd performance issue

От
Tom Lane
Дата:
Bart Grantham <bart@logicworks.net> writes:
> ... it may be relevant that one node_id has 15 times as many connections:
> ... but that shouldn't make a difference to the planner, should it?

Of course it should (and does).

> Yes, I've vacuum analyzed.

What you need to do is show EXPLAIN ANALYZE output.  There's not much
point in complaining that the planner's estimates don't match reality
when you don't present the reality to compare with.

BTW, this would be more on-topic on the pgsql-performance list.

> I'd like to have a query pass back an array of ints to a
> function call.  Something like this:
> logicops2=> select * from nodes2ancestors(array[(select node_id from
> nodes where node_type_id = 3)]::int[], 0);

I think it's "array(select node_id from nodes where node_type_id = 3)"
... no square brackets anywhere.

            regards, tom lane