Обсуждение: Optimisation, index use question [long]

Поиск
Список
Период
Сортировка

Optimisation, index use question [long]

От
Francois Suter
Дата:
Hi all,

I am having my first try at optimizing the way queries are run on one of my
databases and I'm having a few problems/questions.

Here is the picture (kept simple): I have a table that is at the center of a
many-to-many relationship with about 300K records. Here is its definition:

blocop2=# \d protocolitems
                               Table "public.protocolitems"
         Column         |   Type   |                       Modifiers
------------------------+----------+----------------------------------------
---------------
 protocolitemID         | bigint   | not null default
nextval('"protocolitems_seq"'::text)
 protocolitemProtocolID | bigint   | not null
 protocolitemItemID     | bigint   | not null
 protocolitemQuantity   | smallint | not null default 0
Indexes: protocolitems_pkey primary key btree ("protocolitemID"),
         protocol_item_pair unique btree ("protocolitemProtocolID",
"protocolitemItemID"),
         item_index btree ("protocolitemItemID"),
         protocol_index btree ("protocolitemProtocolID")
Foreign Key constraints: protocol_used FOREIGN KEY
("protocolitemProtocolID") REFERENCES protocols("protocolID") ON UPDATE NO
ACTION ON DELETE CASCADE,
                         article_referenced FOREIGN KEY
("protocolitemItemID") REFERENCES items("itemID") ON UPDATE NO ACTION ON
DELETE CASCADE

So it's got a primary key and two foreign keys. I decided to index the two
foreign keys to that they could be queried more quickly. I also defined a
unique constraint because I cannot have duplicate pairs of foreign keys.

That's actually the first question: since the unique constraint already
generates an index on the two foreign keys, do I really need the two other
indexes?

Now the problem I have is that, although this table is not huge, it's still
quite large and operations on a couple of thousand records are taking quite
some time. Roughly said an update/delete operation on a single record takes
about 1 to 1.5 second. Sometimes I have to operate on 4000 records in a go,
which means between 4000 and 6000 seconds. The application having a web
interface, I guess you can imagine I have a timeout problem on my hands. I
can access the server configuration and change the timeout limits, but
changing it to that extent (1.5 hours!) doesn't seem like a good solution.

I tried to play around with the indexes and I got surprising results from
the query planner. Here goes. First an extract of the table:

blocop2=# select * from protocolitems where "protocolitemItemID" = 101;
 protocolitemID | protocolitemProtocolID | protocolitemItemID |
protocolitemQuantity
----------------+------------------------+--------------------+-------------
---------
         247501 |                      1 |                101 |
0
         247502 |                      2 |                101 |
0
         247503 |                      3 |                101 |
0
         247504 |                      4 |                101 |
0
         247505 |                      5 |                101 |
0
...
(30 rows)

Then I ran explain analyze on a delete operation of the first record of the
extract above, once with the primary key and once with the foreign key pair:

blocop2=# explain analyze delete from protocolitems where "protocolitemID" =
247501;
                                                  QUERY PLAN
----------------------------------------------------------------------------
----------------------------------
 Seq Scan on protocolitems  (cost=0.00..4971.75 rows=1 width=6) (actual
time=1038.13..1055.69 rows=1 loops=1)
   Filter: ("protocolitemID" = 247501)
 Total runtime: 1056.07 msec
(3 rows)

blocop2=# explain analyze delete from protocolitems where
"protocolitemProtocolID" = 1 and "protocolitemItemID" = 101;
                                                  QUERY PLAN
----------------------------------------------------------------------------
----------------------------------
 Seq Scan on protocolitems  (cost=0.00..5595.50 rows=3 width=6) (actual
time=1048.63..1048.63 rows=0 loops=1)
   Filter: (("protocolitemProtocolID" = 1) AND ("protocolitemItemID" = 101))
 Total runtime: 1048.84 msec
(3 rows)

My first surprise is that the cost/time is the same. I would think that
scanning two columns would be more costly than scanning a single one. Second
surprise, why is the index not used? Third question, does the query planner
take into account the cost of rebuilding the indexes?

I ran the same two operations dropping one index after the other. I spare
you the details: since the indexes weren't used anyway, the results don't
change.

I then reset the indexes and tried to run the queries again by setting
enable_seqscan to off:

blocop2=# set enable_seqscan to off;
SET
blocop2=# explain analyze delete from protocolitems where "protocolitemID" =
247501;
                                                        QUERY PLAN
----------------------------------------------------------------------------
-----------------------------------------------
 Seq Scan on protocolitems  (cost=100000000.00..100004953.74 rows=1 width=6)
(actual time=1210.40..1210.40 rows=0 loops=1)
   Filter: ("protocolitemID" = 247501)
 Total runtime: 1210.71 msec
(3 rows)

blocop2=# explain analyze delete from protocolitems where
"protocolitemProtocolID" = 1 and "protocolitemItemID" = 101;
                                                        QUERY PLAN
----------------------------------------------------------------------------
-----------------------------------------------
 Seq Scan on protocolitems  (cost=100000000.00..100005577.48 rows=1 width=6)
(actual time=1272.72..1272.72 rows=0 loops=1)
   Filter: (("protocolitemProtocolID" = 1) AND ("protocolitemItemID" = 101))
 Total runtime: 1273.00 msec
(3 rows)

Now although the query plan does change, it's still not using the indexes.
Why?

Any help would be appreciated.

Cheers.

--------
Francois

Home page: http://www.monpetitcoin.com/
"We waste our time spending money we don't have to buy things we don't need
to impress people we don't like"


Re: Optimisation, index use question [long]

От
Martijn van Oosterhout
Дата:
On Sun, Jul 13, 2003 at 03:43:06PM +0200, Francois Suter wrote:
> Hi all,
>
> I am having my first try at optimizing the way queries are run on one of my
> databases and I'm having a few problems/questions.

Standard issue. When you specify an unquoted number in a query it's
interpreted as an int4 which doesn't match your indexes. Suggestions are:

- Put quotes around your numbers or   eg. '1'
- Cast them to the right type         eg. 1::bigint

Naturally you analysed, right?
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> "the West won the world not by the superiority of its ideas or values or
> religion but rather by its superiority in applying organized violence.
> Westerners often forget this fact, non-Westerners never do."
>   - Samuel P. Huntington

Вложения

Re: Optimisation, index use question [long]

От
Francois Suter
Дата:
> Standard issue. When you specify an unquoted number in a query it's
> interpreted as an int4 which doesn't match your indexes. Suggestions are:
>
> - Put quotes around your numbers or   eg. '1'
> - Cast them to the right type         eg. 1::bigint

Huh, yeah, I remember now reading about this. Thanks a lot.

Still any idea about why isn't it faster to query on the primary key than on
the two foreign keys?

Cheers


--------
Francois

Home page: http://www.monpetitcoin.com/
"We waste our time spending money we don't have to buy things we don't need
to impress people we don't like"