Обсуждение: Fast distinct not working as expected
Hi, we are using a mono-column index on a huge table to try to make a quick 'select distinct ' on the column. This used to work fine, but... it does not anymore. We don't know what happened. Here are the facts: - request: SELECT dwhinv___rfovsnide::varchar FROM dwhinv WHERE dwhinv___rfovsnide > '201212_cloture' ORDER BY dwhinv___rfovsnide LIMIT 1 - Plan : Limit (cost=0.00..1.13 rows=1 width=12) (actual time=5798.915..5798.916 rows=1 loops=1) -> Index Scan using vsn_idx on dwhinv (cost=0.00..302591122.05 rows=267473826 width=12) (actual time=5798.912..5798.912 rows=1 loops=1) Index Cond: ((dwhinv___rfovsnide)::text > '201212_cloture'::text) Total runtime: 5799.141 ms - default_statistics_target = 200; - postgresql Version 8.4 - Index used : CREATE INDEX vsn_idx ON dwhinv USING btree (dwhinv___rfovsnide); There are 26 distinct values of the column. This query used to take some milliseconds at most. The index has been freshly recreated. What could be the problem ? Franck
On Thu, Apr 17, 2014 at 8:11 AM, Franck Routier <franck.routier@axege.com> wrote:
Hi,
we are using a mono-column index on a huge table to try to make a quick
'select distinct ' on the column.
This used to work fine, but... it does not anymore. We don't know what
happened.
Here are the facts:
- request:
SELECT dwhinv___rfovsnide::varchar FROM dwhinv WHERE dwhinv___rfovsnide
> '201212_cloture' ORDER BY dwhinv___rfovsnide LIMIT 1
That is not equivalent to a distinct. There must be more to it than that.
- Plan :
Limit (cost=0.00..1.13 rows=1 width=12) (actual time=5798.915..5798.916
rows=1 loops=1)
-> Index Scan using vsn_idx on dwhinv (cost=0.00..302591122.05
rows=267473826 width=12) (actual time=5798.912..5798.912 rows=1 loops=1)
Index Cond: ((dwhinv___rfovsnide)::text > '201212_cloture'::text)
Total runtime: 5799.141 ms
My best guess would be that the index got stuffed full of entries for rows that are not visible, either because they are not yet committed, or have been deleted but are not yet vacuumable. Do you have any long-lived transactions?
- postgresql Version 8.4
Newer versions have better diagnostic tools. An explain (analyze, buffers) would be nice, especially with track_io_timing on.
Cheers,
Jeff
Hi,
CREATE OR REPLACE FUNCTION small_distinct(IN tablename character varying, IN fieldname character varying, IN sample anyelement DEFAULT ''::character varying)
RETURNS SETOF anyelement AS
$BODY$
BEGIN
EXECUTE 'SELECT '||fieldName||' FROM '||tableName||' ORDER BY '||fieldName
||' LIMIT 1' INTO result;
WHILE result IS NOT NULL LOOP
RETURN NEXT;
EXECUTE 'SELECT '||fieldName||' FROM '||tableName
||' WHERE '||fieldName||' > $1 ORDER BY ' || fieldName || ' LIMIT 1'
INTO result USING result;
END LOOP;
END;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100
ROWS 1000;
Since we have the problem, some iteration of the query are still quick (< 1ms), but others are long (> 5s).
We have recreated the index, but it did not help.
In the explain analyze output, the index scan begins at 5798.912. What can be happening before that ?
Regards,
Franck
Indeed, this query is used in a loop:That is not equivalent to a distinct. There must be more to it than that.
CREATE OR REPLACE FUNCTION small_distinct(IN tablename character varying, IN fieldname character varying, IN sample anyelement DEFAULT ''::character varying)
RETURNS SETOF anyelement AS
$BODY$
BEGIN
EXECUTE 'SELECT '||fieldName||' FROM '||tableName||' ORDER BY '||fieldName
||' LIMIT 1' INTO result;
WHILE result IS NOT NULL LOOP
RETURN NEXT;
EXECUTE 'SELECT '||fieldName||' FROM '||tableName
||' WHERE '||fieldName||' > $1 ORDER BY ' || fieldName || ' LIMIT 1'
INTO result USING result;
END LOOP;
END;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100
ROWS 1000;
Since we have the problem, some iteration of the query are still quick (< 1ms), but others are long (> 5s).
There has been a delete on the table (about 20% of the records). Then a manual VACUUM.My best guess would be that the index got stuffed full of entries for rows that are not visible, either because they are not yet committed, or have been deleted but are not yet vacuumable. Do you have any long-lived transactions?
We have recreated the index, but it did not help.
In the explain analyze output, the index scan begins at 5798.912. What can be happening before that ?
Index Scan using vsn_idx on dwhinv (cost=0.00..302591122.05 rows=267473826 width=12) (actual time=5798.912..5798.912 rows=1 loops=1)(Notice the delay is not planning itself, as explain is instantaneous)
Yep, we certainly would like to, but this is a distant prod box, with no access to an online upgrade source, and no planned upgrade for now :-((
- postgresql Version 8.4Newer versions have better diagnostic tools. An explain (analyze, buffers) would be nice, especially with track_io_timing on.
Regards,
Franck
On Thu, Apr 17, 2014 at 10:17 AM, Franck Routier <franck.routier@axege.com> wrote:
There has been a delete on the table (about 20% of the records). Then a manual VACUUM.My best guess would be that the index got stuffed full of entries for rows that are not visible, either because they are not yet committed, or have been deleted but are not yet vacuumable. Do you have any long-lived transactions?
We have recreated the index, but it did not help.
If there are any open transactions (even ones that have never touched this particular table) which started before the delete was committed, then the vacuum was obliged to keep those deleted records around, in case that open transaction happens to start caring about them. I assume that the deleted rows were not randomly distributed, but rather were concentrated in the exact range you are now inspecting.
The reindex was also obliged to carry those deleted but not yet uninteresting rows along to the new index.
In the explain analyze output, the index scan begins at 5798.912. What can be happening before that ?
The index scan reports it first *visible* row at 5798.912. Before that, it was probably digging through thousands or millions of deleted rows, labouriously verifying that they are not visible to it (but still theoretically visible to someone else).
It could be blocked on a lock or something, or you could have really horrible IO contention that takes 5 seconds to read two blocks. But I think the other scenario is more likely.
By the way, many people don't like silent cross-posting, as then we end up unknowningly answering a question here that was already answered elsewhere.
Cheers,
Jeff
Hi,
Best regards,
Franck
Le 17/04/2014 20:17, Jeff Janes a écrit :
Ok, I understand nowIf there are any open transactions (even ones that have never touched this particular table) which started before the delete was committed, then the vacuum was obliged to keep those deleted records around, in case that open transaction happens to start caring about them. I assume that the deleted rows were not randomly distributed, but rather were concentrated in the exact range you are now inspecting.The reindex was also obliged to carry those deleted but not yet uninteresting rows along to the new index.
Very clear explaination, thank you.
In the explain analyze output, the index scan begins at 5798.912. What can be happening before that ?The index scan reports it first *visible* row at 5798.912. Before that, it was probably digging through thousands or millions of deleted rows, labouriously verifying that they are not visible to it (but still theoretically visible to someone else).
Yes, I also think so.It could be blocked on a lock or something, or you could have really horrible IO contention that takes 5 seconds to read two blocks. But I think the other scenario is more likely.
Yes, sorry for that. I don't like it either, this was posted by a colleague of mine. One of those young foolish guys that prefer web interfaces to plain mailing-lists... Gonna scold him :-)By the way, many people don't like silent cross-posting, as then we end up unknowningly answering a question here that was already answered elsewhere.
Best regards,
Franck
I have found the problem, using this query (found here http://stackoverflow.com/questions/3312929/postgresql-idle-in-transaction-diagnosis-and-reading-pg-locks)
select pg_class.relname, pg_locks.transactionid, pg_locks.mode, pg_locks.granted as "g", pg_stat_activity.current_query, pg_stat_activity.query_start, age(now(),pg_stat_activity.query_start) as "age", pg_stat_activity.procpid from pg_stat_activity,pg_locks left outer join pg_class on (pg_locks.relation = pg_class.oid) where pg_locks.pid=pg_stat_activity.procpid and pg_stat_activity.procpid = <AN IDLE TRANSACTION PROCESS> order by query_start;
And indeed, we constantly have idle transcations. They all use the same dummy table, a dual table substitute containing only one column, and one row.
We use this table with tomcat-jdbc-pool to check connections health with 'select 1 from dual' (we don't use 'select 1' for portability reasons, to work with oracle also).
And these transactions are never commited. So we have a bunch of running transactions, constantly running and recreated by tomcat-jdbc-pool. Some of them run for hours.
This seems to impact significally the ability of postgresql to vacuum... and thus to keep efficient indexes!
Changing the configration of tomcat-jdbc-pool to 'select 1 from dual; commit;' seems to resolve the problem.
I'm going to ask on tomcat-jdbc-pool mailing-list if this is ok.
Thanks a lot for your help.
Franck
On Fri, Apr 18, 2014 at 3:07 AM, Franck Routier <franck.routier@axege.com> wrote: > I have found the problem, using this query (found here > http://stackoverflow.com/questions/3312929/postgresql-idle-in-transaction-diagnosis-and-reading-pg-locks) > > select pg_class.relname, pg_locks.transactionid, pg_locks.mode, > pg_locks.granted as "g", pg_stat_activity.current_query, > pg_stat_activity.query_start, > age(now(),pg_stat_activity.query_start) as "age", > pg_stat_activity.procpid > from pg_stat_activity,pg_locks > left outer join pg_class on (pg_locks.relation = pg_class.oid) > where pg_locks.pid=pg_stat_activity.procpid > and pg_stat_activity.procpid = <AN IDLE TRANSACTION PROCESS> > order by query_start; > > > And indeed, we constantly have idle transcations. They all use the same > dummy table, a dual table substitute containing only one column, and one > row. > We use this table with tomcat-jdbc-pool to check connections health with > 'select 1 from dual' (we don't use 'select 1' for portability reasons, to > work with oracle also). > And these transactions are never commited. So we have a bunch of running > transactions, constantly running and recreated by tomcat-jdbc-pool. Some of > them run for hours. > This seems to impact significally the ability of postgresql to vacuum... and > thus to keep efficient indexes! It affects a lot of other things too. All locks held by those transactions are still held. Failure to release transactions is a major application error that can and will explode the database. It's similar in severity to a memory leak. The basic rule of thumb is that transactions should be held for the shortest possible duration -- especially those that write to the database. merlin