Обсуждение: index not used in joins
Hello all, i have a problem with index usage and joins. Attached is some SQL demonstrating my problem; Why is the index only used in the 2nd query? Can anybody explain me how to avoid/fix this. Thanks in advance Sebastian CREATE TABLE users ( login NAME NOT NULL PRIMARY KEY, datum TIMESTAMP, version INTEGER ); CREATE TABLE test ( datum TIMESTAMP NOT NULL, version INTEGER NOT NULL, approved TIMESTAMP ); CREATE OR REPLACE VIEW v AS SELECT t.* FROM test AS t INNER JOIN users AS u ON t.datum <= u.datum AND (t.version = u.version OR t.approved IS NOT NULL); CREATE OR REPLACE FUNCTION fill () RETURNS BOOLEAN AS ' DECLARE i INTEGER; BEGIN FOR i IN 1..1000 LOOP EXECUTE ''INSERT INTO test (datum,version) VALUES (now(),''|| i || '')''; END LOOP; RETURN TRUE; END; ' LANGUAGE plpgsql; CREATE OR REPLACE FUNCTION approved () RETURNS BOOLEAN AS ' DECLARE i INTEGER; BEGIN FOR i IN 1..1000 LOOP EXECUTE ''INSERT INTO test (datum,version,approved) VALUES (now(),''|| i || '',now())''; END LOOP; RETURN TRUE; END; ' LANGUAGE plpgsql; SELECT fill(); SELECT fill(); SELECT fill(); SELECT fill(); SELECT fill(); SELECT fill(); SELECT fill(); SELECT fill(); SELECT fill(); SELECT fill(); SELECT fill(); SELECT fill(); SELECT fill(); SELECT fill(); SELECT fill(); SELECT fill(); SELECT fill(); SELECT fill(); SELECT fill(); SELECT fill(); SELECT approved(); INSERT INTO users (login,datum,version) VALUES ('sb',now(),'999'); CREATE INDEX test_ ON test (datum); CREATE INDEX test_999 ON test (datum) WHERE version = '999' OR approved IS NOT NULL; ANALYZE; EXPLAIN ANALYZE SELECT * FROM v; EXPLAIN ANALYZE SELECT t.* FROM test AS t INNER JOIN users AS u ON t.datum <= u.datum AND (t.version = '999' OR t.approvedIS NOT NULL);
Sebastian Böck wrote: > Richard Huxton wrote: >> Can you post the output from your "explain analyse" calls too? The >> statistics aren't going to be the same on different machines. >> > > Sure, here it is. Thanks. (PS - remember to cc the list too). > EXPLAIN ANALYZE SELECT * FROM v; > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------------------- > > Nested Loop (cost=0.00..263.12 rows=116 width=20) (actual > time=5.171..109.910 rows=1020 loops=1) > Join Filter: (("inner"."version" = "outer"."version") OR > ("inner".approved IS NOT NULL)) > -> Seq Scan on users u (cost=0.00..1.01 rows=1 width=12) (actual > time=0.005..0.009 rows=1 loops=1) > -> Index Scan using test_ on test t (cost=0.00..155.74 rows=7092 > width=20) (actual time=0.012..64.873 rows=21000 loops=1) > Index Cond: (t.datum <= "outer".datum) > Total runtime: 111.879 ms > EXPLAIN ANALYZE SELECT t.* FROM test AS t INNER JOIN users AS u ON > t.datum <= u.datum AND (t.version = '999' OR t.approved IS NOT NULL); > QUERY PLAN > ----------------------------------------------------------------------------------------------------------------------------- > > Nested Loop (cost=0.00..7.78 rows=133 width=20) (actual > time=0.035..7.733 rows=1020 loops=1) > -> Seq Scan on users u (cost=0.00..1.01 rows=1 width=8) (actual > time=0.006..0.010 rows=1 loops=1) > -> Index Scan using test_999 on test t (cost=0.00..5.11 rows=132 > width=20) (actual time=0.017..3.358 rows=1020 loops=1) > Index Cond: (t.datum <= "outer".datum) > Filter: (("version" = 999) OR (approved IS NOT NULL)) > Total runtime: 9.528 ms OK - so what you want to know is why index "test_999" is used in the second but not the first, even though both return the same rows. The fact is that the conditional index: CREATE INDEX test_999 ON test (datum) WHERE version = '999' OR approved IS NOT NULL; AFAIK looks at the WHERE clause of your query to determine where it can run. Don't forget that the planner needs to pick which index is best *before* it starts fetching data. So - in the first example there might be rows where e.g. t.version=998 which means test_999 would be a poor choice of index. -- Richard Huxton Archonet Ltd
Richard Huxton wrote: > Sebastian Böck wrote: > >> Richard Huxton wrote: >> >>> Can you post the output from your "explain analyse" calls too? The >>> statistics aren't going to be the same on different machines. >>> >> >> Sure, here it is. > > > Thanks. (PS - remember to cc the list too). [output of EXPLAIN ANALYZE] > OK - so what you want to know is why index "test_999" is used in the > second but not the first, even though both return the same rows. > > The fact is that the conditional index: > CREATE INDEX test_999 ON test (datum) > WHERE version = '999' OR approved IS NOT NULL; > AFAIK looks at the WHERE clause of your query to determine where it can > run. Don't forget that the planner needs to pick which index is best > *before* it starts fetching data. > > So - in the first example there might be rows where e.g. t.version=998 > which means test_999 would be a poor choice of index. But what if the table users contains only 1 row and the column "version" has a value of "999"? Are there any other options to speed up this kind of query? Thanks so far Sebastian
Sebastian Böck wrote: > Richard Huxton wrote: > >> Sebastian Böck wrote: >> >>> Richard Huxton wrote: >>> >>>> Can you post the output from your "explain analyse" calls too? The >>>> statistics aren't going to be the same on different machines. >>>> >>> >>> Sure, here it is. >> >> >> >> Thanks. (PS - remember to cc the list too). > > > [output of EXPLAIN ANALYZE] > >> OK - so what you want to know is why index "test_999" is used in the >> second but not the first, even though both return the same rows. >> >> The fact is that the conditional index: >> CREATE INDEX test_999 ON test (datum) >> WHERE version = '999' OR approved IS NOT NULL; >> AFAIK looks at the WHERE clause of your query to determine where it >> can run. Don't forget that the planner needs to pick which index is >> best *before* it starts fetching data. >> >> So - in the first example there might be rows where e.g. t.version=998 >> which means test_999 would be a poor choice of index. > > > But what if the table users contains only 1 row and the column "version" > has a value of "999"? It still doesn't know that the only value in "version" is 999(*). Let's say there were 2000 rows and 1900 had the value 999 - the index is still useless because we'd have to do a sequential scan to check the remaining 200 rows. > Are there any other options to speed up this kind of query? Well, your problem is the (version=X OR approved IS NOT NULL) clause. I must admit I can't quite see what this is supposed to do. The "test" table connects to the "users" table via "version" (and "datum", though not a simple check) unless the "test" has been "approved", in which case it applies to all users? Can you explain what the various tables/columns are really for? (*) Don't forget the statistics for column values are usually out-of-date compared to the actual data, so you can't rely on it. -- Richard Huxton Archonet Ltd
Richard Huxton wrote: > Sebastian Böck wrote: > >> Richard Huxton wrote: >> >>> Sebastian Böck wrote: >>> >>>> Richard Huxton wrote: >>>> >>>>> Can you post the output from your "explain analyse" calls too? The >>>>> statistics aren't going to be the same on different machines. >>>>> >>>> >>>> Sure, here it is. >>> >>> >>> >>> >>> Thanks. (PS - remember to cc the list too). >> >> >> >> [output of EXPLAIN ANALYZE] >> >>> OK - so what you want to know is why index "test_999" is used in the >>> second but not the first, even though both return the same rows. >>> >>> The fact is that the conditional index: >>> CREATE INDEX test_999 ON test (datum) >>> WHERE version = '999' OR approved IS NOT NULL; >>> AFAIK looks at the WHERE clause of your query to determine where it >>> can run. Don't forget that the planner needs to pick which index is >>> best *before* it starts fetching data. >>> >>> So - in the first example there might be rows where e.g. >>> t.version=998 which means test_999 would be a poor choice of index. >> >> >> >> But what if the table users contains only 1 row and the column "version" >> has a value of "999"? > > > It still doesn't know that the only value in "version" is 999(*). Let's > say there were 2000 rows and 1900 had the value 999 - the index is still > useless because we'd have to do a sequential scan to check the remaining > 200 rows. > >> Are there any other options to speed up this kind of query? > > > Well, your problem is the (version=X OR approved IS NOT NULL) clause. I > must admit I can't quite see what this is supposed to do. The "test" > table connects to the "users" table via "version" (and "datum", though > not a simple check) unless the "test" has been "approved", in which case > it applies to all users? > Can you explain what the various tables/columns are really for? The whole thing is a multiuser facility managment application. Every user can plan things like he wants (different versions). All these changes apply to a common (approved) version. Things get complicated as everybody should be able to "travel" through the history via the "datum" field. That's why i need this "silly OR" in my where-clause. At the moment i get very exciting results using immutable functions, but i have another question. In the docs it is stated that: IMMUTABLE indicates that the function always returns the same result when given the same argument values; What if i define my functions like: CREATE OR REPLACE FUNCTION datum () RETURNS TIMESTAMP AS ' SELECT datum FROM public.benutzer; ' LANGUAGE sql IMMUTABLE; They normally (untill now) give the correct results, also if the values in the underlaying view changes. Can i relay on this or is it only luck. > (*) Don't forget the statistics for column values are usually > out-of-date compared to the actual data, so you can't rely on it. I'm aware of that. Thanks Sebastian