Обсуждение: index not used in joins
Hello all,
i have a problem with index usage and joins.
Attached is some SQL demonstrating my problem;
Why is the index only used in the 2nd query?
Can anybody explain me how to avoid/fix this.
Thanks in advance
Sebastian
CREATE TABLE users (
login NAME NOT NULL PRIMARY KEY,
datum TIMESTAMP,
version INTEGER
);
CREATE TABLE test (
datum TIMESTAMP NOT NULL,
version INTEGER NOT NULL,
approved TIMESTAMP
);
CREATE OR REPLACE VIEW v AS
SELECT t.*
FROM test AS t
INNER JOIN users AS u ON
t.datum <= u.datum AND
(t.version = u.version OR
t.approved IS NOT NULL);
CREATE OR REPLACE FUNCTION fill () RETURNS BOOLEAN AS '
DECLARE
i INTEGER;
BEGIN
FOR i IN 1..1000 LOOP
EXECUTE ''INSERT INTO test (datum,version)
VALUES (now(),''|| i || '')'';
END LOOP;
RETURN TRUE;
END;
' LANGUAGE plpgsql;
CREATE OR REPLACE FUNCTION approved () RETURNS BOOLEAN AS '
DECLARE
i INTEGER;
BEGIN
FOR i IN 1..1000 LOOP
EXECUTE ''INSERT INTO test (datum,version,approved)
VALUES (now(),''|| i || '',now())'';
END LOOP;
RETURN TRUE;
END;
' LANGUAGE plpgsql;
SELECT fill();
SELECT fill();
SELECT fill();
SELECT fill();
SELECT fill();
SELECT fill();
SELECT fill();
SELECT fill();
SELECT fill();
SELECT fill();
SELECT fill();
SELECT fill();
SELECT fill();
SELECT fill();
SELECT fill();
SELECT fill();
SELECT fill();
SELECT fill();
SELECT fill();
SELECT fill();
SELECT approved();
INSERT INTO users (login,datum,version) VALUES ('sb',now(),'999');
CREATE INDEX test_ ON test (datum);
CREATE INDEX test_999 ON test (datum) WHERE version = '999' OR approved IS NOT NULL;
ANALYZE;
EXPLAIN ANALYZE SELECT * FROM v;
EXPLAIN ANALYZE SELECT t.* FROM test AS t INNER JOIN users AS u ON t.datum <= u.datum AND (t.version = '999' OR
t.approvedIS NOT NULL);
Sebastian Böck wrote:
> Richard Huxton wrote:
>> Can you post the output from your "explain analyse" calls too? The
>> statistics aren't going to be the same on different machines.
>>
>
> Sure, here it is.
Thanks. (PS - remember to cc the list too).
> EXPLAIN ANALYZE SELECT * FROM v;
> QUERY PLAN
>
-------------------------------------------------------------------------------------------------------------------------------
>
> Nested Loop (cost=0.00..263.12 rows=116 width=20) (actual
> time=5.171..109.910 rows=1020 loops=1)
> Join Filter: (("inner"."version" = "outer"."version") OR
> ("inner".approved IS NOT NULL))
> -> Seq Scan on users u (cost=0.00..1.01 rows=1 width=12) (actual
> time=0.005..0.009 rows=1 loops=1)
> -> Index Scan using test_ on test t (cost=0.00..155.74 rows=7092
> width=20) (actual time=0.012..64.873 rows=21000 loops=1)
> Index Cond: (t.datum <= "outer".datum)
> Total runtime: 111.879 ms
> EXPLAIN ANALYZE SELECT t.* FROM test AS t INNER JOIN users AS u ON
> t.datum <= u.datum AND (t.version = '999' OR t.approved IS NOT NULL);
> QUERY PLAN
>
-----------------------------------------------------------------------------------------------------------------------------
>
> Nested Loop (cost=0.00..7.78 rows=133 width=20) (actual
> time=0.035..7.733 rows=1020 loops=1)
> -> Seq Scan on users u (cost=0.00..1.01 rows=1 width=8) (actual
> time=0.006..0.010 rows=1 loops=1)
> -> Index Scan using test_999 on test t (cost=0.00..5.11 rows=132
> width=20) (actual time=0.017..3.358 rows=1020 loops=1)
> Index Cond: (t.datum <= "outer".datum)
> Filter: (("version" = 999) OR (approved IS NOT NULL))
> Total runtime: 9.528 ms
OK - so what you want to know is why index "test_999" is used in the
second but not the first, even though both return the same rows.
The fact is that the conditional index:
CREATE INDEX test_999 ON test (datum)
WHERE version = '999' OR approved IS NOT NULL;
AFAIK looks at the WHERE clause of your query to determine where it can
run. Don't forget that the planner needs to pick which index is best
*before* it starts fetching data.
So - in the first example there might be rows where e.g. t.version=998
which means test_999 would be a poor choice of index.
--
Richard Huxton
Archonet Ltd
Richard Huxton wrote: > Sebastian Böck wrote: > >> Richard Huxton wrote: >> >>> Can you post the output from your "explain analyse" calls too? The >>> statistics aren't going to be the same on different machines. >>> >> >> Sure, here it is. > > > Thanks. (PS - remember to cc the list too). [output of EXPLAIN ANALYZE] > OK - so what you want to know is why index "test_999" is used in the > second but not the first, even though both return the same rows. > > The fact is that the conditional index: > CREATE INDEX test_999 ON test (datum) > WHERE version = '999' OR approved IS NOT NULL; > AFAIK looks at the WHERE clause of your query to determine where it can > run. Don't forget that the planner needs to pick which index is best > *before* it starts fetching data. > > So - in the first example there might be rows where e.g. t.version=998 > which means test_999 would be a poor choice of index. But what if the table users contains only 1 row and the column "version" has a value of "999"? Are there any other options to speed up this kind of query? Thanks so far Sebastian
Sebastian Böck wrote: > Richard Huxton wrote: > >> Sebastian Böck wrote: >> >>> Richard Huxton wrote: >>> >>>> Can you post the output from your "explain analyse" calls too? The >>>> statistics aren't going to be the same on different machines. >>>> >>> >>> Sure, here it is. >> >> >> >> Thanks. (PS - remember to cc the list too). > > > [output of EXPLAIN ANALYZE] > >> OK - so what you want to know is why index "test_999" is used in the >> second but not the first, even though both return the same rows. >> >> The fact is that the conditional index: >> CREATE INDEX test_999 ON test (datum) >> WHERE version = '999' OR approved IS NOT NULL; >> AFAIK looks at the WHERE clause of your query to determine where it >> can run. Don't forget that the planner needs to pick which index is >> best *before* it starts fetching data. >> >> So - in the first example there might be rows where e.g. t.version=998 >> which means test_999 would be a poor choice of index. > > > But what if the table users contains only 1 row and the column "version" > has a value of "999"? It still doesn't know that the only value in "version" is 999(*). Let's say there were 2000 rows and 1900 had the value 999 - the index is still useless because we'd have to do a sequential scan to check the remaining 200 rows. > Are there any other options to speed up this kind of query? Well, your problem is the (version=X OR approved IS NOT NULL) clause. I must admit I can't quite see what this is supposed to do. The "test" table connects to the "users" table via "version" (and "datum", though not a simple check) unless the "test" has been "approved", in which case it applies to all users? Can you explain what the various tables/columns are really for? (*) Don't forget the statistics for column values are usually out-of-date compared to the actual data, so you can't rely on it. -- Richard Huxton Archonet Ltd
Richard Huxton wrote: > Sebastian Böck wrote: > >> Richard Huxton wrote: >> >>> Sebastian Böck wrote: >>> >>>> Richard Huxton wrote: >>>> >>>>> Can you post the output from your "explain analyse" calls too? The >>>>> statistics aren't going to be the same on different machines. >>>>> >>>> >>>> Sure, here it is. >>> >>> >>> >>> >>> Thanks. (PS - remember to cc the list too). >> >> >> >> [output of EXPLAIN ANALYZE] >> >>> OK - so what you want to know is why index "test_999" is used in the >>> second but not the first, even though both return the same rows. >>> >>> The fact is that the conditional index: >>> CREATE INDEX test_999 ON test (datum) >>> WHERE version = '999' OR approved IS NOT NULL; >>> AFAIK looks at the WHERE clause of your query to determine where it >>> can run. Don't forget that the planner needs to pick which index is >>> best *before* it starts fetching data. >>> >>> So - in the first example there might be rows where e.g. >>> t.version=998 which means test_999 would be a poor choice of index. >> >> >> >> But what if the table users contains only 1 row and the column "version" >> has a value of "999"? > > > It still doesn't know that the only value in "version" is 999(*). Let's > say there were 2000 rows and 1900 had the value 999 - the index is still > useless because we'd have to do a sequential scan to check the remaining > 200 rows. > >> Are there any other options to speed up this kind of query? > > > Well, your problem is the (version=X OR approved IS NOT NULL) clause. I > must admit I can't quite see what this is supposed to do. The "test" > table connects to the "users" table via "version" (and "datum", though > not a simple check) unless the "test" has been "approved", in which case > it applies to all users? > Can you explain what the various tables/columns are really for? The whole thing is a multiuser facility managment application. Every user can plan things like he wants (different versions). All these changes apply to a common (approved) version. Things get complicated as everybody should be able to "travel" through the history via the "datum" field. That's why i need this "silly OR" in my where-clause. At the moment i get very exciting results using immutable functions, but i have another question. In the docs it is stated that: IMMUTABLE indicates that the function always returns the same result when given the same argument values; What if i define my functions like: CREATE OR REPLACE FUNCTION datum () RETURNS TIMESTAMP AS ' SELECT datum FROM public.benutzer; ' LANGUAGE sql IMMUTABLE; They normally (untill now) give the correct results, also if the values in the underlaying view changes. Can i relay on this or is it only luck. > (*) Don't forget the statistics for column values are usually > out-of-date compared to the actual data, so you can't rely on it. I'm aware of that. Thanks Sebastian