Re: query plan not optimal

Поиск
Список
Период
Сортировка
От Marc Cousin
Тема Re: query plan not optimal
Дата
Msg-id 52B34240.4020905@gmail.com
обсуждение исходный текст
Ответ на Re: query plan not optimal  (Jeff Janes <jeff.janes@gmail.com>)
Ответы query plan not optimal
Список pgsql-performance

On 19/12/2013 19:33, Jeff Janes wrote:
>                                                                 QUERY PLAN
>
----------------------------------------------------------------------------------------------------------------------------------
>      Nested Loop  (cost=0.56..4001768.10 rows=479020 width=26) (actual
>     time=2.303..15371.237 rows=479020 loops=1)
>        Output: path.pathid, batch.filename
>        Buffers: shared hit=2403958 read=7539
>        ->  Seq Scan on public.batch  (cost=0.00..11727.20 rows=479020
>     width=85) (actual time=0.340..160.142 rows=479020 loops=1)
>              Output: batch.path, batch.filename
>              Buffers: shared read=6937
>        ->  Index Scan using idx_path on public.path  (cost=0.56..8.32
>     rows=1 width=16) (actual time=0.030..0.031 rows=1 loops=479020)
>              Output: path.pathid, path.path
>              Index Cond: (path.path = batch.path)
>              Buffers: shared hit=2403958 read=602
>      Total runtime: 15439.043 ms
>
>
>     As you can see, more than twice as fast, and a very high hit ratio
>     on the path table, even if we start from a cold cache (I did, here,
>     both PostgreSQL and OS). We have an excellent hit ratio because the
>     batch table contains few different path (several files in a
>     directory), and is already quite clustered, as it comes from a
>     backup, which is of course performed directory by directory.
>
>
> What is your effective_cache_size set to?
>
> Cheers,
>
> Jeff
Yeah, I had forgotten to set it up correctly on this test environment
(its value is correctly set in production environments). Putting it to a
few gigabytes here gives me this cost:

bacula=# explain select pathid, filename from batch join path using (path);
                                 QUERY PLAN
----------------------------------------------------------------------------
 Nested Loop  (cost=0.56..2083904.10 rows=479020 width=26)
   ->  Seq Scan on batch  (cost=0.00..11727.20 rows=479020 width=85)
   ->  Index Scan using idx_path on path  (cost=0.56..4.32 rows=1 width=16)
         Index Cond: (path = batch.path)
(4 lignes)

It still chooses the hash join though, but by a smaller margin.

And it still only will access a very small part of path (always the same
5000 records) during the query, which isn't accounted for in the cost if
I understand correctly ?


В списке pgsql-performance по дате отправления:

Предыдущее
От: Jeff Janes
Дата:
Сообщение: Re: query plan not optimal
Следующее
От: Kevin Grittner
Дата:
Сообщение: Re: slow query - will CLUSTER help?