On 19/12/2013 19:33, Jeff Janes wrote:
> QUERY PLAN
>
----------------------------------------------------------------------------------------------------------------------------------
> Nested Loop (cost=0.56..4001768.10 rows=479020 width=26) (actual
> time=2.303..15371.237 rows=479020 loops=1)
> Output: path.pathid, batch.filename
> Buffers: shared hit=2403958 read=7539
> -> Seq Scan on public.batch (cost=0.00..11727.20 rows=479020
> width=85) (actual time=0.340..160.142 rows=479020 loops=1)
> Output: batch.path, batch.filename
> Buffers: shared read=6937
> -> Index Scan using idx_path on public.path (cost=0.56..8.32
> rows=1 width=16) (actual time=0.030..0.031 rows=1 loops=479020)
> Output: path.pathid, path.path
> Index Cond: (path.path = batch.path)
> Buffers: shared hit=2403958 read=602
> Total runtime: 15439.043 ms
>
>
> As you can see, more than twice as fast, and a very high hit ratio
> on the path table, even if we start from a cold cache (I did, here,
> both PostgreSQL and OS). We have an excellent hit ratio because the
> batch table contains few different path (several files in a
> directory), and is already quite clustered, as it comes from a
> backup, which is of course performed directory by directory.
>
>
> What is your effective_cache_size set to?
>
> Cheers,
>
> Jeff
Yeah, I had forgotten to set it up correctly on this test environment
(its value is correctly set in production environments). Putting it to a
few gigabytes here gives me this cost:
bacula=# explain select pathid, filename from batch join path using (path);
QUERY PLAN
----------------------------------------------------------------------------
Nested Loop (cost=0.56..2083904.10 rows=479020 width=26)
-> Seq Scan on batch (cost=0.00..11727.20 rows=479020 width=85)
-> Index Scan using idx_path on path (cost=0.56..4.32 rows=1 width=16)
Index Cond: (path = batch.path)
(4 lignes)
It still chooses the hash join though, but by a smaller margin.
And it still only will access a very small part of path (always the same
5000 records) during the query, which isn't accounted for in the cost if
I understand correctly ?