Обсуждение: two table join just not fast enough.

Поиск
Список
Период
Сортировка

two table join just not fast enough.

От
Brian Fehrle
Дата:
Hi all,

I've got a query that I need to squeeze as much speed out of as I can.

When I execute this query, the average time it takes is about 190 ms. I
increased my work_mem from 1 MB to 50MB and it decreased the timing down
to an average of 170 ms, but that's still not fast enough. This query is
executed extremely frequently, so much of it should be easily cached.

Some settings
work_mem = 50MB
shared_buffers = 5GB

I've made sure that the two tables are recently analyzed (with
default_stats to 100, 400, and 1500 even), and table bloat is low (150
meg table has 7 megs bloat).

Here's the query:
SELECT yankee.my_id
FROM yankee
INNER JOIN hotel_zulu
     ON hotel_zulu.my_id = yankee.zulu_id
         AND hotel_zulu.type IN ('string1', 'string2', 'string3', 'string4')
WHERE yankee.your_id=402513;

And here is a query plan.

Hash Join  (cost=17516.470..26386.660 rows=27624 width=4) (actual time=309.194..395.135 rows=12384 loops=1)
     Hash Cond: (yankee.alpha = hotel_zulu.quebec)
   ->   Bitmap Heap Scan on yankee  (cost=1066.470..8605.770 rows=27624 width=20) (actual time=5.178..34.693 rows=26963
loops=1)
           Recheck Cond: (mike = 402513)
         ->   Bitmap Index Scan on hotel_alpha  (cost=0.000..1059.570 rows=27624 width=0) (actual time=4.770..4.770
rows=26967loops=1) 
                 Index Cond: (mike = 402513)
   ->   Hash  (cost=14465.080..14465.080 rows=114154 width=16) (actual time=303.717..303.717 rows=129908 loops=1)
           Buckets: 4096  Batches: 8  Memory Usage: 784kB
         ->   Bitmap Heap Scan on hotel_zulu  (cost=2461.770..14465.080 rows=114154 width=16) (actual
time=25.642..185.253rows=129908 loops=1) 
                 Recheck Cond: ((two)::golf = ANY ('xray'::golf[]))
               ->   Bitmap Index Scan on kilo  (cost=0.000..2433.230 rows=114154 width=0) (actual time=23.887..23.887
rows=130292loops=1) 
                       Index Cond: ((two)::golf = ANY ('xray'::golf[]))



One thing I notice is the rows estimated is 27624 and the actual rows
returned is 12384. Quite a bit different.

Table 2 (known here as hotel_zulu) is being joined on zulu_id to the
first table, and then a where clause on the column 'type'. There are
single column indexes on each of these columns, and any multi column
index I put on these are just ignored by the planner.

Any thoughts on ways to tweak this?

- Brian F

Re: two table join just not fast enough.

От
Tom Lane
Дата:
Brian Fehrle <brianf@consistentstate.com> writes:
> I've got a query that I need to squeeze as much speed out of as I can.

Hmm ... are you really sure this is being run with work_mem = 50MB?
The hash join is getting "batched", which means the executor thinks it's
working under a memory constraint significantly less than the size of
the filtered inner relation, which should be no more than a couple
megabytes according to this.

I'm not sure how much that will save, since the hashjoin seems to be
reasonably speedy anyway, but there's not much other fat to trim here.

One minor suggestion is to think whether you really need string
comparisons here or could convert that to use of an enum type.
String compares ain't cheap, especially not in non-C locales.

            regards, tom lane

Re: two table join just not fast enough.

От
Brian Fehrle
Дата:
Thanks Tom,
And looks like I pasted an older explain plan, which is almost exactly
the same as the one with 50MB work_mem, except for the hash join
'buckets' part which used more memory and only one 'bucket' so to speak.

When running with the 50MB work_mem over 1MB work_mem, the query went
from an average of 190 ms to 169 ms, so it did help some but it wasn't a
game changer (I even found for this specific query, 6MB of work_mem was
the most that would actually help me).

I have other plans to try to get this thing running faster, I'll be
exploring them tomorrow, as well as looking at using an enum type.

- Brian F

On 11/02/2011 05:53 PM, Tom Lane wrote:
> Brian Fehrle<brianf@consistentstate.com>  writes:
>> I've got a query that I need to squeeze as much speed out of as I can.
> Hmm ... are you really sure this is being run with work_mem = 50MB?
> The hash join is getting "batched", which means the executor thinks it's
> working under a memory constraint significantly less than the size of
> the filtered inner relation, which should be no more than a couple
> megabytes according to this.
>
> I'm not sure how much that will save, since the hashjoin seems to be
> reasonably speedy anyway, but there's not much other fat to trim here.
>
> One minor suggestion is to think whether you really need string
> comparisons here or could convert that to use of an enum type.
> String compares ain't cheap, especially not in non-C locales.
>
>             regards, tom lane


Re: two table join just not fast enough.

От
Mark Kirkwood
Дата:
On 03/11/11 09:12, Brian Fehrle wrote:
>
>
> And here is a query plan.
>
> Hash Join  (cost=17516.470..26386.660 rows=27624 width=4) (actual
> time=309.194..395.135 rows=12384 loops=1)
>     Hash Cond: (yankee.alpha = hotel_zulu.quebec)
>   ->   Bitmap Heap Scan on yankee  (cost=1066.470..8605.770 rows=27624
> width=20) (actual time=5.178..34.693 rows=26963 loops=1)
>           Recheck Cond: (mike = 402513)
>         ->   Bitmap Index Scan on hotel_alpha  (cost=0.000..1059.570
> rows=27624 width=0) (actual time=4.770..4.770 rows=26967 loops=1)
>                 Index Cond: (mike = 402513)
>   ->   Hash  (cost=14465.080..14465.080 rows=114154 width=16) (actual
> time=303.717..303.717 rows=129908 loops=1)
>           Buckets: 4096  Batches: 8  Memory Usage: 784kB
>         ->   Bitmap Heap Scan on hotel_zulu  (cost=2461.770..14465.080
> rows=114154 width=16) (actual time=25.642..185.253 rows=129908 loops=1)
>                 Recheck Cond: ((two)::golf = ANY ('xray'::golf[]))
>               ->   Bitmap Index Scan on kilo  (cost=0.000..2433.230
> rows=114154 width=0) (actual time=23.887..23.887 rows=130292 loops=1)
>                       Index Cond: ((two)::golf = ANY ('xray'::golf[]))
>

Might be worth posting table definitions, as this plan does not
immediately look like it came from the query you posted. Also unless I
am misreading the output looks like you have some custom datatypes (e.g
'golf'), so more info there could be useful too.

When we have that, there may be something to be learned from examining
the pg_stats data for the join and predicate columns used in these queries.

regards

Mark