On Mon, Feb 3, 2014 at 1:35 PM, Pweaver (Paul Weaver) <pweaver@panjiva.com> wrote: > We have been running into a (live lock?) issue on our production Postgres > instance causing queries referencing a particular table to become extremely > slow and our application to lock up.
Livelock? Really? That would imply that the query would never finish. A livelock is morally equivalent to an undetected deadlock.
Livelock is bad term.
> This tends to occur on a particular table that gets a lot of queries against > it after a large number of deletes. When this happens, the following > symptoms occur when queries referencing that table are run (even it we stop > the deleting): > > SELECT * FROM table_name LIMIT 10; -- takes ~45 seconds to complete > EXPLAIN SELECT * FROM table_name LIMIT 10; -- takes ~45 seconds to complete > the explain query, the query plan looks reasonable > EXPLAIN SELECT * FROM table_name LIMIT 10; -- takes ~45 seconds to complete > the explain analyze query, query plan looks reasonable, timing stats says > query took sub millisecond time to complete
Why should explain analyze say that? You'd need to catch the problem as it is run.
> SELECT * FROM another_table LIMIT 10; -- takes sub millisecond time > EXPLAIN * FROM another_table LIMIT 10; -- takes sub millisecond time, query > plan looks reasonable > > This behavior only stops and the queries go back to taking sub millisecond > time if we take the application issuing the SELECTs offline and wait for the > active queries to finish (or terminate them). > > There is not a particularly large load on the database machine at the time, > neither are there a particularly large number of wal logs being written > (although there is a burst of wal log writes immediately after the queue is > cleared).