Обсуждение: Poor performance on a simple join

Поиск
Список
Период
Сортировка

Poor performance on a simple join

От
CS DBA
Дата:
Hi All;

The below contab2 table conmtains ~400,000 rows. This query should not
take this long.  We've tweaked work_mem up to 50MB, ensured that the
appropriate indexes are in place, etc...

Thoughts?

Thanks in advance


Explain analyze:
SELECT contab2.contacts_tab
FROM contab2
INNER JOIN sctab
     ON sctab.id = contab2.to_service_id
         AND sctab.type IN ('FService', 'FqService', 'LService',
'TService')
WHERE contab2.from_contact_id=402513;

QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------------


  Hash Join  (cost=16904.28..25004.54 rows=26852 width=4) (actual
time=302.621..371.599 rows=12384 loops=1)
    Hash Cond: (contab2.to_service_id = sctab.id)
    ->  Bitmap Heap Scan on contab2  (cost=1036.49..8566.14 rows=26852
width=20) (actual time=5.191..32.701 rows=26963 loops=1)
          Recheck Cond: (from_contact_id = 402513)
          ->  Bitmap Index Scan on index_contab2_on_from_user_id
(cost=0.00..1029.78 rows=26852 width=0) (actual time=4.779..4.779
rows=26963 loops=1)
                Index Cond: (from_contact_id = 402513)
    ->  Hash  (cost=14445.19..14445.19 rows=113808 width=16) (actual
time=297.332..297.332 rows=129945 loops=1)
          Buckets: 16384  Batches: 1  Memory Usage: 6092kB
          ->  Bitmap Heap Scan on sctab  (cost=2447.07..14445.19
rows=113808 width=16) (actual time=29.480..187.166 rows=129945 loops=1)
                Recheck Cond: ((type)::text = ANY
('{FService,FqService,LService,TService}'::text[]))
                ->  Bitmap Index Scan on index_sctab_on_type
(cost=0.00..2418.62 rows=113808 width=0) (actual time=27.713..27.713
rows=130376 loops=1)
                      Index Cond: ((type)::text = ANY
('{FService,FqService,LService,TService}'::text[]))
  Total runtime: 382.514 ms
(13 rows)

--
---------------------------------------------
Kevin Kempter       -       Constent State
A PostgreSQL Professional Services Company
           www.consistentstate.com
---------------------------------------------


Re: Poor performance on a simple join

От
Scott Marlowe
Дата:
On Wed, Nov 2, 2011 at 2:21 PM, CS DBA <cs_dba@consistentstate.com> wrote:
> Hi All;
>
> The below contab2 table conmtains ~400,000 rows. This query should not take
> this long.  We've tweaked work_mem up to 50MB, ensured that the appropriate
> indexes are in place, etc...
>
> Thoughts?
>
> Thanks in advance

How long should it take?  300 milliseconds is fairly fast for mushing
129k rows up against 26k rows and getting 12k rows back.  That's 40
rows / millisecond, which isn't too bad really.


What pg version are you running?  What evidence do you have that this
is slow? i.e. other machines you've run it on where it's faster?  What
hardware (CPU, RAM, IO subsystem, OS) Are you running on?

>
>
> Explain analyze:
> SELECT contab2.contacts_tab
> FROM contab2
> INNER JOIN sctab
>    ON sctab.id = contab2.to_service_id
>        AND sctab.type IN ('FService', 'FqService', 'LService', 'TService')
> WHERE contab2.from_contact_id=402513;
>                                                                        QUERY
> PLAN
>
-----------------------------------------------------------------------------------------------------------------------------------------------------------
>  Hash Join  (cost=16904.28..25004.54 rows=26852 width=4) (actual
> time=302.621..371.599 rows=12384 loops=1)
>   Hash Cond: (contab2.to_service_id = sctab.id)
>   ->  Bitmap Heap Scan on contab2  (cost=1036.49..8566.14 rows=26852
> width=20) (actual time=5.191..32.701 rows=26963 loops=1)
>         Recheck Cond: (from_contact_id = 402513)
>         ->  Bitmap Index Scan on index_contab2_on_from_user_id
>  (cost=0.00..1029.78 rows=26852 width=0) (actual time=4.779..4.779
> rows=26963 loops=1)
>               Index Cond: (from_contact_id = 402513)
>   ->  Hash  (cost=14445.19..14445.19 rows=113808 width=16) (actual
> time=297.332..297.332 rows=129945 loops=1)
>         Buckets: 16384  Batches: 1  Memory Usage: 6092kB
>         ->  Bitmap Heap Scan on sctab  (cost=2447.07..14445.19 rows=113808
> width=16) (actual time=29.480..187.166 rows=129945 loops=1)
>               Recheck Cond: ((type)::text = ANY
> ('{FService,FqService,LService,TService}'::text[]))
>               ->  Bitmap Index Scan on index_sctab_on_type
>  (cost=0.00..2418.62 rows=113808 width=0) (actual time=27.713..27.713
> rows=130376 loops=1)
>                     Index Cond: ((type)::text = ANY
> ('{FService,FqService,LService,TService}'::text[]))
>  Total runtime: 382.514 ms
> (13 rows)
>
> --
> ---------------------------------------------
> Kevin Kempter       -       Constent State
> A PostgreSQL Professional Services Company
>          www.consistentstate.com
> ---------------------------------------------
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>



--
To understand recursion, one must first understand recursion.

Re: Poor performance on a simple join

От
CS DBA
Дата:
On 11/02/2011 02:45 PM, Scott Marlowe wrote:
> On Wed, Nov 2, 2011 at 2:21 PM, CS DBA<cs_dba@consistentstate.com>  wrote:
>> Hi All;
>>
>> The below contab2 table conmtains ~400,000 rows. This query should not take
>> this long.  We've tweaked work_mem up to 50MB, ensured that the appropriate
>> indexes are in place, etc...
>>
>> Thoughts?
>>
>> Thanks in advance
> How long should it take?  300 milliseconds is fairly fast for mushing
> 129k rows up against 26k rows and getting 12k rows back.  That's 40
> rows / millisecond, which isn't too bad really.
>
>
> What pg version are you running?  What evidence do you have that this
> is slow? i.e. other machines you've run it on where it's faster?  What
> hardware (CPU, RAM, IO subsystem, OS) Are you running on?
>
>>
>> Explain analyze:
>> SELECT contab2.contacts_tab
>> FROM contab2
>> INNER JOIN sctab
>>     ON sctab.id = contab2.to_service_id
>>         AND sctab.type IN ('FService', 'FqService', 'LService', 'TService')
>> WHERE contab2.from_contact_id=402513;
>>                                                                         QUERY
>> PLAN
>>
-----------------------------------------------------------------------------------------------------------------------------------------------------------
>>   Hash Join  (cost=16904.28..25004.54 rows=26852 width=4) (actual
>> time=302.621..371.599 rows=12384 loops=1)
>>    Hash Cond: (contab2.to_service_id = sctab.id)
>>    ->    Bitmap Heap Scan on contab2  (cost=1036.49..8566.14 rows=26852
>> width=20) (actual time=5.191..32.701 rows=26963 loops=1)
>>          Recheck Cond: (from_contact_id = 402513)
>>          ->    Bitmap Index Scan on index_contab2_on_from_user_id
>>   (cost=0.00..1029.78 rows=26852 width=0) (actual time=4.779..4.779
>> rows=26963 loops=1)
>>                Index Cond: (from_contact_id = 402513)
>>    ->    Hash  (cost=14445.19..14445.19 rows=113808 width=16) (actual
>> time=297.332..297.332 rows=129945 loops=1)
>>          Buckets: 16384  Batches: 1  Memory Usage: 6092kB
>>          ->    Bitmap Heap Scan on sctab  (cost=2447.07..14445.19 rows=113808
>> width=16) (actual time=29.480..187.166 rows=129945 loops=1)
>>                Recheck Cond: ((type)::text = ANY
>> ('{FService,FqService,LService,TService}'::text[]))
>>                ->    Bitmap Index Scan on index_sctab_on_type
>>   (cost=0.00..2418.62 rows=113808 width=0) (actual time=27.713..27.713
>> rows=130376 loops=1)
>>                      Index Cond: ((type)::text = ANY
>> ('{FService,FqService,LService,TService}'::text[]))
>>   Total runtime: 382.514 ms
>> (13 rows)
>>
>> --
>> ---------------------------------------------
>> Kevin Kempter       -       Constent State
>> A PostgreSQL Professional Services Company
>>           www.consistentstate.com
>> ---------------------------------------------
>>
>>
>> --
>> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-performance
>>
>
>
Agreed. but it's not fast enough for the client.  I think we're going to
look at creating an aggregate table or maybe partitioning



--
---------------------------------------------
Kevin Kempter       -       Constent State
A PostgreSQL Professional Services Company
           www.consistentstate.com
---------------------------------------------


Re: Poor performance on a simple join

От
Scott Marlowe
Дата:
On Wed, Nov 2, 2011 at 3:53 PM, CS DBA <cs_dba@consistentstate.com> wrote:
> On 11/02/2011 02:45 PM, Scott Marlowe wrote:
>>
>> On Wed, Nov 2, 2011 at 2:21 PM, CS DBA<cs_dba@consistentstate.com>  wrote:
>>>
>>> Hi All;
>>>
>>> The below contab2 table conmtains ~400,000 rows. This query should not
>>> take
>>> this long.  We've tweaked work_mem up to 50MB, ensured that the
>>> appropriate
>>> indexes are in place, etc...
>>>
>>> Thoughts?
>>>
>>> Thanks in advance
>>
>> How long should it take?  300 milliseconds is fairly fast for mushing
>> 129k rows up against 26k rows and getting 12k rows back.  That's 40
>> rows / millisecond, which isn't too bad really.
>>
>>
>> What pg version are you running?  What evidence do you have that this
>> is slow? i.e. other machines you've run it on where it's faster?  What
>> hardware (CPU, RAM, IO subsystem, OS) Are you running on?
>>
>>>
>>> Explain analyze:
>>> SELECT contab2.contacts_tab
>>> FROM contab2
>>> INNER JOIN sctab
>>>    ON sctab.id = contab2.to_service_id
>>>        AND sctab.type IN ('FService', 'FqService', 'LService',
>>> 'TService')
>>> WHERE contab2.from_contact_id=402513;
>>>
>>>  QUERY
>>> PLAN
>>>
>>>
-----------------------------------------------------------------------------------------------------------------------------------------------------------
>>>  Hash Join  (cost=16904.28..25004.54 rows=26852 width=4) (actual
>>> time=302.621..371.599 rows=12384 loops=1)
>>>   Hash Cond: (contab2.to_service_id = sctab.id)
>>>   ->    Bitmap Heap Scan on contab2  (cost=1036.49..8566.14 rows=26852
>>> width=20) (actual time=5.191..32.701 rows=26963 loops=1)
>>>         Recheck Cond: (from_contact_id = 402513)
>>>         ->    Bitmap Index Scan on index_contab2_on_from_user_id
>>>  (cost=0.00..1029.78 rows=26852 width=0) (actual time=4.779..4.779
>>> rows=26963 loops=1)
>>>               Index Cond: (from_contact_id = 402513)
>>>   ->    Hash  (cost=14445.19..14445.19 rows=113808 width=16) (actual
>>> time=297.332..297.332 rows=129945 loops=1)
>>>         Buckets: 16384  Batches: 1  Memory Usage: 6092kB
>>>         ->    Bitmap Heap Scan on sctab  (cost=2447.07..14445.19
>>> rows=113808
>>> width=16) (actual time=29.480..187.166 rows=129945 loops=1)
>>>               Recheck Cond: ((type)::text = ANY
>>> ('{FService,FqService,LService,TService}'::text[]))
>>>               ->    Bitmap Index Scan on index_sctab_on_type
>>>  (cost=0.00..2418.62 rows=113808 width=0) (actual time=27.713..27.713
>>> rows=130376 loops=1)
>>>                     Index Cond: ((type)::text = ANY
>>> ('{FService,FqService,LService,TService}'::text[]))
>>>  Total runtime: 382.514 ms
>>> (13 rows)
>>>
>>> --
>>> ---------------------------------------------
>>> Kevin Kempter       -       Constent State
>>> A PostgreSQL Professional Services Company
>>>          www.consistentstate.com
>>> ---------------------------------------------
>>>
>>>
>>> --
>>> Sent via pgsql-performance mailing list
>>> (pgsql-performance@postgresql.org)
>>> To make changes to your subscription:
>>> http://www.postgresql.org/mailpref/pgsql-performance
>>
> Agreed. but it's not fast enough for the client.  I think we're going to
> look at creating an aggregate table or maybe partitioning

Take a look here:
http://tech.jonathangardner.net/wiki/PostgreSQL/Materialized_Views

Re: Poor performance on a simple join

От
Shaun Thomas
Дата:
On 11/02/2011 09:04 PM, Scott Marlowe wrote:

> Take a look here:
> http://tech.jonathangardner.net/wiki/PostgreSQL/Materialized_Views

Not sure materialized views are the approach I would take here. We
actually see a lot of these kinds of queries with giant result sets,
here. If they actually need all 12k rows for every execution (not
likely, but possible) and 300ms is just too darn slow for that, there's
always client-side caching.

We have a couple queries that we need to keep cached at all times. Stock
quotes and positions, for example, have to be available in sub-ms time
thanks to the level of parallelism involved. One query in particular
effectively grabs the entire set of current positions and every
optimization in the book brings its execution time down to about two
seconds. We can't have thousands of clients executing that all the time,
so it gets shoved into a local memcached on each webserver.

But if he's getting back 12k rows even *after* specifying a contact ID,
a materialized view is still going to return 12k rows, and still has to
perform at least an index scan unless he creates an MV for each contact
ID (eww). This doesn't really look like fact-table territory either.

I think the real question is: Why isn't 300ms fast enough? Is it because
the client executes this repeatedly? If so, what changes often enough it
must fetch all 12k rows every single time? Would implementing a
timestamp and only grabbing newer rows work better? Is it because of
several connections each running it in parallel? Why not cache a local
copy and refresh periodically? Do they actually need all 12k rows every
time? maybe some limit and offset clauses are in order.

There's very little a human can do with 12k results. An automated tool
shouldn't be grabbing them either, unless they're actually changing with
every execution. If they're not, the tool really wants items since the
last change, or it's doing too much work. If it were a report, 300ms is
nothing compared to most reporting queries which can run for several
minutes.

I think we're missing a lot of details here.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
sthomas@peak6.com

______________________________________________

See http://www.peak6.com/email-disclaimer/ for terms and conditions related to this email