Re: What is the right way to deal with a table with rows that are not in a random order?

Поиск
Список
Период
Сортировка
От Douglas Alan
Тема Re: What is the right way to deal with a table with rows that are not in a random order?
Дата
Msg-id ce6334d00905281245l288c203p7d5a9d3b91718596@mail.gmail.com
обсуждение исходный текст
Ответ на Re: What is the right way to deal with a table with rows that are not in a random order?  (Scott Marlowe <scott.marlowe@gmail.com>)
Ответы Re: What is the right way to deal with a table with rows that are not in a random order?  (Scott Marlowe <scott.marlowe@gmail.com>)
Список pgsql-general
On Thu, May 28, 2009 at 10:41 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote:

> Note that in the OPs case I'd probably try testing things like turning
> off seqscan, or lowering random_page_cost.  I'd also look at
> clustering on the index for the field you're selecting on.

I'm worried that turning off seqscan would distort other queries.  (Remember, I don't have control of the individual queries.  The user of the application can specify all sorts of queries, and there's an ORM in the middle.)

In order to force Postgres to do an index scan for this query, I had to set random_page_cost to 0.  Even 0.1 was not small enough.  Alternatively, I could set seq_page_cost to 39.  (38 was not big enough.)  Again, I'm worried that by using such a big hammer, I would distort Postgres's query planning for other queries.

|>ouglas

P.S. Here is the actual data that some people have been clamoring for:

Slow query:

psql> select * from maindb_astobject
     where survey_id = 2
     limit 20 offset 1000;

Time: 18073.691 ms


Here's the explanation:

                                    QUERY PLAN                                      
-------------------------------------------------------------------------------------
Limit  (cost=47.99..48.95 rows=20 width=153)
  ->  Seq Scan on maindb_astobject  (cost=0.00..3538556.10 rows=73736478 width=153)
        Filter: (survey_id = 2)

The same query made fast by removing Postgres's ability to know a priori what particular value is being searched for:

psql> select * from maindb_astobject join maindb_enumentity
     on maindb_astobject.survey_id = maindb_enumentity.id
     where entityname = 'MACHO'
     limit 20 offet 1000;

Time: 1.638 ms

Here's the explanation for the above query showing that it is using the index:

QUERY PLAN                                    
  --------------------------------------------------------------------------
   Limit  (cost=164.97..168.27 rows=20 width=215)
     ->  Nested Loop  (cost=0.00..1233523.72 rows=7477081 width=215)
   ->  Seq Scan on maindb_enumentity  (cost=0.00..1.12 rows=1 width=62)
 Filter: ((entityname)::text = 'MACHO'::text)
   ->  Index Scan using maindb_astobject_survey_id on maindb_astobject  (cost=0.00..1046595.57 rows=14954162 width=153)
 Index Cond: (maindb_astobject.survey_id = maindb_enumentity.id)


P.P.S. Many other queries were very slow due to Postgres wanting to use hash joins and merge joins rather than nested index scans.  The nested index scans were, in fact, much faster.  I just turned off hash joins and merge joins:  problem solved.  It might be nice at some point to figure out what is going with Postgres trying to use these fancier joins that turn out to be much slower for us, but that's a worry for another day, since I have a perfectly good work-around at the moment.

В списке pgsql-general по дате отправления:

Предыдущее
От: Osvaldo Kussama
Дата:
Сообщение: Re: How to pass parameters into a sql script ?
Следующее
От: Scott Marlowe
Дата:
Сообщение: Re: What is the right way to deal with a table with rows that are not in a random order?