Обсуждение: Slow query postgres 8.3

Поиск
Список
Период
Сортировка

Slow query postgres 8.3

От
"Anne Rosset"
Дата:

Hi,

I am trying to tune a query that is taking too much time on a large dataset (postgres 8.3).

 

SELECT DISTINCT

       role_user.project_id AS projectId,

       sfuser.username AS adminUsername,

       sfuser.full_name AS adminFullName

FROM

       role_operation role_operation,

       role role,

       sfuser sfuser,

       role_user role_user

WHERE

       role_operation.role_id=role.id

        AND role.id=role_user.role_id

        AND role_user.user_id=sfuser.id

        AND role_operation.object_type_id='SfMain.Project'

        AND role_operation.operation_category='admin'

        AND role_operation.operation_name='admin'

ORDER BY

adminFullName ASC

 

 

It has the following query plan:

QUERY PLAN                                                        

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Unique  (cost=1218.57..1221.26 rows=269 width=35) (actual time=16700.332..17212.849 rows=30136 loops=1)

   ->  Sort  (cost=1218.57..1219.24 rows=269 width=35) (actual time=16700.306..16885.972 rows=41737 loops=1)

         Sort Key: sfuser.full_name, role_user.project_id, sfuser.username

         Sort Method:  quicksort  Memory: 4812kB

         ->  Nested Loop  (cost=0.00..1207.71 rows=269 width=35) (actual time=71.173..15788.798 rows=41737 loops=1)

               ->  Nested Loop  (cost=0.00..1118.22 rows=269 width=18) (actual time=65.550..12440.383 rows=41737 loops=1)

                     ->  Nested Loop  (cost=0.00..256.91 rows=41 width=18) (actual time=19.312..7150.925 rows=6108 loops=1)

                           ->  Index Scan using role_oper_obj_oper on role_operation  (cost=0.00..85.15 rows=41 width=9) (actual time=19.196..2561.765 rows=6108 loops=1)

                                 Index Cond: (((object_type_id)::text = 'SfMain.Project'::text) AND ((operation_category)::text = 'admin'::text) AND ((operation_name)::text = 'admin'::text))

                           ->  Index Scan using role_pk on role  (cost=0.00..4.18 rows=1 width=9) (actual time=0.727..0.732 rows=1 loops=6108)

                                 Index Cond: ((role.id)::text = (role_operation.role_id)::text)

                     ->  Index Scan using role_user_proj_idx on role_user  (cost=0.00..20.84 rows=13 width=27) (actual time=0.301..0.795 rows=7 loops=6108)

                           Index Cond: ((role_user.role_id)::text = (role_operation.role_id)::text)

               ->  Index Scan using sfuser_pk on sfuser  (cost=0.00..0.32 rows=1 width=35) (actual time=0.056..0.062 rows=1 loops=41737)

                     Index Cond: ((sfuser.id)::text = (role_user.user_id)::text)

Total runtime: 17343.185 ms

(16 rows)

 

 

I have tried adding an index on role_operation.role_id but it didn’t seem to help or changing the query to:

SELECT
       role_user.project_id AS projectId,
       sfuser.username AS adminUsername,
       sfuser.full_name AS adminFullName
FROM
        sfuser sfuser,
       role_user role_user
WHERE
      role_user.role_id in (select role_operation.role_id from role_operation where role_operation.object_type_id=
'SfMain.Project'
        AND role_operation.operation_category='admin'
        AND role_operation.operation_name='admin') AND role_user.user_id=sfuser.id
 
ORDER BY
       adminFullName ASC
 
None of this seemed to improve the performance.
 
Does anyone have a suggestion?

 

Thanks a lot,

Anne

Re: Slow query postgres 8.3

От
tv@fuzzy.cz
Дата:
> Hi,
>
> I am trying to tune a query that is taking too much time on a large
> dataset (postgres 8.3).
>

Hi, run ANALYZE on the tables used in the query - the stats are very off,
so the db chooses a really bad execution plan.

Tomas


Re: Slow query postgres 8.3

От
"Anne Rosset"
Дата:
Hi Thomas,
Here is the plan after explain. 
QUERY PLAN                                                                                           

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=1330.27..1333.24 rows=297 width=35) (actual time=4011.861..4526.583 rows=30136 loops=1)
   ->  Sort  (cost=1330.27..1331.01 rows=297 width=35) (actual time=4011.828..4198.006 rows=41737 loops=1)
         Sort Key: sfuser.full_name, role_user.project_id, sfuser.username
         Sort Method:  quicksort  Memory: 4812kB
         ->  Nested Loop  (cost=0.00..1318.07 rows=297 width=35) (actual time=0.622..3107.994 rows=41737 loops=1)
               ->  Nested Loop  (cost=0.00..1219.26 rows=297 width=18) (actual time=0.426..1212.175 rows=41737
loops=1)
                     ->  Nested Loop  (cost=0.00..282.11 rows=45 width=18) (actual time=0.325..371.295 rows=6108
loops=1)
                           ->  Index Scan using role_oper_obj_oper on role_operation  (cost=0.00..93.20 rows=45
width=9)(actual time=0.236..71.291 rows=6108 loops=1)
 
                                 Index Cond: (((object_type_id)::text = 'SfMain.Project'::text) AND
((operation_category)::text= 'admin'::text) AND ((operation_name)::text = 'admin'::text))
 
                           ->  Index Scan using role_pk on role  (cost=0.00..4.19 rows=1 width=9) (actual
time=0.025..0.030rows=1 loops=6108)
 
                                 Index Cond: ((role.id)::text = (role_operation.role_id)::text)
                     ->  Index Scan using role_user_proj_idx on role_user  (cost=0.00..20.66 rows=13 width=27) (actual
time=0.025..0.066rows=7 loops=6108)
 
                           Index Cond: ((role_user.role_id)::text = (role_operation.role_id)::text)
               ->  Index Scan using sfuser_pk on sfuser  (cost=0.00..0.32 rows=1 width=35) (actual time=0.022..0.027
rows=1loops=41737)
 
                     Index Cond: ((sfuser.id)::text = (role_user.user_id)::text)
 Total runtime: 4657.488 ms
(16 rows)

Is there anything that can be done. For instance for the 1s in the index scan on sfuser?
Thanks,
Anne

-----Original Message-----
From: tv@fuzzy.cz [mailto:tv@fuzzy.cz] 
Sent: Saturday, April 09, 2011 3:36 AM
To: Anne Rosset
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Slow query postgres 8.3

> Hi,
>
> I am trying to tune a query that is taking too much time on a large 
> dataset (postgres 8.3).
>

Hi, run ANALYZE on the tables used in the query - the stats are very off, so the db chooses a really bad execution
plan.

Tomas


Re: Slow query postgres 8.3

От
"Kevin Grittner"
Дата:
"Anne Rosset" <arosset@collab.net> wrote:

>                            ->  Index Scan using role_oper_obj_oper
> on role_operation  (cost=0.00..93.20 rows=45 width=9) (actual
> time=0.236..71.291 rows=6108 loops=1)
>                                  Index Cond:
> (((object_type_id)::text = 'SfMain.Project'::text) AND
> ((operation_category)::text = 'admin'::text) AND
> ((operation_name)::text = 'admin'::text))

This looks like another case where there is a correlation among
multiple values used for selection.  The optimizer assumes, for
example, that category = 'admin' will be true no more often for rows
with operation_name = 'admin' than for other values of
operation_name.  There has been much talk lately about how to make
it smarter about that, but right now there's no general solution,
and workarounds can be tricky.

In more recent versions you could probably work around this with a
Common Table Expression (CTE) (using a WITH clause).  In 8.3 the
best idea which comes immediately to mind is to select from the
role_operation table into a temporary table using whichever of those
three criteria is most selective, and then join that temporary table
into the rest of the query.  Maybe someone else can think of
something better.

-Kevin

Re: Slow query postgres 8.3

От
Claudio Freire
Дата:
I actually implemented a statistical system for measuring these kinds
of correlations.

It's complex, but it might be adaptable to pgsql. Furthermore, one of
the latest projects of mine was to replace the purely statistical
approach with SVCs.
Too bad I won't be able to devote any time to that project before september.

On Mon, Apr 11, 2011 at 6:59 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
> There has been much talk lately about how to make
> it smarter about that, but right now there's no general solution,
> and workarounds can be tricky.

Re: Slow query postgres 8.3

От
Tomas Vondra
Дата:
Dne 12.4.2011 09:33, Claudio Freire napsal(a):
> I actually implemented a statistical system for measuring these kinds
> of correlations.
>
> It's complex, but it might be adaptable to pgsql. Furthermore, one of
> the latest projects of mine was to replace the purely statistical
> approach with SVCs.

You mean Support Vector Classifiers? Interesting idea, although I don't
see how to apply that to query planning, especially with non-numeric
inputs. Could you share more details on that statistical system and how
do you think it could be applied in the pgsql world?

> Too bad I won't be able to devote any time to that project before september.

I've been working on cross column stats for some time, and although I
had to put it aside for some time I'm going to devote more time to this
issue soon. So interesting ideas/comments are very welcome.

regards
Tomas

Re: Slow query postgres 8.3

От
Claudio Freire
Дата:
On Wed, Apr 13, 2011 at 10:16 PM, Tomas Vondra <tv@fuzzy.cz> wrote:
> You mean Support Vector Classifiers? Interesting idea, although I don't
> see how to apply that to query planning, especially with non-numeric
> inputs. Could you share more details on that statistical system and how
> do you think it could be applied in the pgsql world?

Well, in my case, the data was a simple list of attributes. You either
had them or not, and data was very sparse, so the task was to fill the
missing bits.

For that, what I did is take a training set of data, and each time I
wanted to know the likelihood of having a certain attribute I would
compute the conditional probability given the training data -
conditional on a set of other data.

So, for postgres, if I had an index over a few columns of booleans
(yea, bare with me) (a,b,c,d), and I wanted to know the selectivity of
"where a", IF i already accounted for "where b" then I'd pick my
training data and count how many of those that have b have also a. So
P(a if b).

Of course, my application had to handle thousands of attributes, so I
couldn't apply conditional distributions on everything, I'd pick the
conditional part (if b) to something that selected an "appropriately
sized" sample from my training data.

All that's very expensive.

So I thought... what about replacing that with an SVC - train an SVC
or SVR model for a, taking b, c, d as parameters. I never had the
oportunity to test the idea, but the SVC variant would probably be
usable by postgres, since all you need to know is b, c, d, they don't
need to be booleans, or scalars in fact, SVCs are very flexible.
Unknown values could easily be compensated for, with some cleverness.
The tough part is, of course, training the SVC, picking the kind of
SVC to use, and storing it into stats tables during analyze. Oh, and
hoping it doesn't make fatal mistakes.

I know, the idea is very green, but it would be a fun project - cough GSoC ;-)