Обсуждение: Slow query postgres 8.3
Hi,
I am trying to tune a query that is taking too much time on a large dataset (postgres 8.3).
SELECT DISTINCT
role_user.project_id AS projectId,
sfuser.username AS adminUsername,
sfuser.full_name AS adminFullName
FROM
role_operation role_operation,
role role,
sfuser sfuser,
role_user role_user
WHERE
role_operation.role_id=role.id
AND role.id=role_user.role_id
AND role_user.user_id=sfuser.id
AND role_operation.object_type_id='SfMain.Project'
AND role_operation.operation_category='admin'
AND role_operation.operation_name='admin'
ORDER BY
adminFullName ASC
It has the following query plan:
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique (cost=1218.57..1221.26 rows=269 width=35) (actual time=16700.332..17212.849 rows=30136 loops=1)
-> Sort (cost=1218.57..1219.24 rows=269 width=35) (actual time=16700.306..16885.972 rows=41737 loops=1)
Sort Key: sfuser.full_name, role_user.project_id, sfuser.username
Sort Method: quicksort Memory: 4812kB
-> Nested Loop (cost=0.00..1207.71 rows=269 width=35) (actual time=71.173..15788.798 rows=41737 loops=1)
-> Nested Loop (cost=0.00..1118.22 rows=269 width=18) (actual time=65.550..12440.383 rows=41737 loops=1)
-> Nested Loop (cost=0.00..256.91 rows=41 width=18) (actual time=19.312..7150.925 rows=6108 loops=1)
-> Index Scan using role_oper_obj_oper on role_operation (cost=0.00..85.15 rows=41 width=9) (actual time=19.196..2561.765 rows=6108 loops=1)
Index Cond: (((object_type_id)::text = 'SfMain.Project'::text) AND ((operation_category)::text = 'admin'::text) AND ((operation_name)::text = 'admin'::text))
-> Index Scan using role_pk on role (cost=0.00..4.18 rows=1 width=9) (actual time=0.727..0.732 rows=1 loops=6108)
Index Cond: ((role.id)::text = (role_operation.role_id)::text)
-> Index Scan using role_user_proj_idx on role_user (cost=0.00..20.84 rows=13 width=27) (actual time=0.301..0.795 rows=7 loops=6108)
Index Cond: ((role_user.role_id)::text = (role_operation.role_id)::text)
-> Index Scan using sfuser_pk on sfuser (cost=0.00..0.32 rows=1 width=35) (actual time=0.056..0.062 rows=1 loops=41737)
Index Cond: ((sfuser.id)::text = (role_user.user_id)::text)
Total runtime: 17343.185 ms
(16 rows)
I have tried adding an index on role_operation.role_id but it didn’t seem to help or changing the query to:
SELECT
role_user.project_id AS projectId,
sfuser.username AS adminUsername,
sfuser.full_name AS adminFullName
FROM
sfuser sfuser,
role_user role_user
WHERE
role_user.role_id in (select role_operation.role_id from role_operation where role_operation.object_type_id=
'SfMain.Project'
AND role_operation.operation_category='admin'
AND role_operation.operation_name='admin') AND role_user.user_id=sfuser.id
ORDER BY
adminFullName ASC
None of this seemed to improve the performance.
Does anyone have a suggestion?
Thanks a lot,
Anne
> Hi, > > I am trying to tune a query that is taking too much time on a large > dataset (postgres 8.3). > Hi, run ANALYZE on the tables used in the query - the stats are very off, so the db chooses a really bad execution plan. Tomas
Hi Thomas, Here is the plan after explain. QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Unique (cost=1330.27..1333.24 rows=297 width=35) (actual time=4011.861..4526.583 rows=30136 loops=1) -> Sort (cost=1330.27..1331.01 rows=297 width=35) (actual time=4011.828..4198.006 rows=41737 loops=1) Sort Key: sfuser.full_name, role_user.project_id, sfuser.username Sort Method: quicksort Memory: 4812kB -> Nested Loop (cost=0.00..1318.07 rows=297 width=35) (actual time=0.622..3107.994 rows=41737 loops=1) -> Nested Loop (cost=0.00..1219.26 rows=297 width=18) (actual time=0.426..1212.175 rows=41737 loops=1) -> Nested Loop (cost=0.00..282.11 rows=45 width=18) (actual time=0.325..371.295 rows=6108 loops=1) -> Index Scan using role_oper_obj_oper on role_operation (cost=0.00..93.20 rows=45 width=9)(actual time=0.236..71.291 rows=6108 loops=1) Index Cond: (((object_type_id)::text = 'SfMain.Project'::text) AND ((operation_category)::text= 'admin'::text) AND ((operation_name)::text = 'admin'::text)) -> Index Scan using role_pk on role (cost=0.00..4.19 rows=1 width=9) (actual time=0.025..0.030rows=1 loops=6108) Index Cond: ((role.id)::text = (role_operation.role_id)::text) -> Index Scan using role_user_proj_idx on role_user (cost=0.00..20.66 rows=13 width=27) (actual time=0.025..0.066rows=7 loops=6108) Index Cond: ((role_user.role_id)::text = (role_operation.role_id)::text) -> Index Scan using sfuser_pk on sfuser (cost=0.00..0.32 rows=1 width=35) (actual time=0.022..0.027 rows=1loops=41737) Index Cond: ((sfuser.id)::text = (role_user.user_id)::text) Total runtime: 4657.488 ms (16 rows) Is there anything that can be done. For instance for the 1s in the index scan on sfuser? Thanks, Anne -----Original Message----- From: tv@fuzzy.cz [mailto:tv@fuzzy.cz] Sent: Saturday, April 09, 2011 3:36 AM To: Anne Rosset Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Slow query postgres 8.3 > Hi, > > I am trying to tune a query that is taking too much time on a large > dataset (postgres 8.3). > Hi, run ANALYZE on the tables used in the query - the stats are very off, so the db chooses a really bad execution plan. Tomas
"Anne Rosset" <arosset@collab.net> wrote: > -> Index Scan using role_oper_obj_oper > on role_operation (cost=0.00..93.20 rows=45 width=9) (actual > time=0.236..71.291 rows=6108 loops=1) > Index Cond: > (((object_type_id)::text = 'SfMain.Project'::text) AND > ((operation_category)::text = 'admin'::text) AND > ((operation_name)::text = 'admin'::text)) This looks like another case where there is a correlation among multiple values used for selection. The optimizer assumes, for example, that category = 'admin' will be true no more often for rows with operation_name = 'admin' than for other values of operation_name. There has been much talk lately about how to make it smarter about that, but right now there's no general solution, and workarounds can be tricky. In more recent versions you could probably work around this with a Common Table Expression (CTE) (using a WITH clause). In 8.3 the best idea which comes immediately to mind is to select from the role_operation table into a temporary table using whichever of those three criteria is most selective, and then join that temporary table into the rest of the query. Maybe someone else can think of something better. -Kevin
I actually implemented a statistical system for measuring these kinds of correlations. It's complex, but it might be adaptable to pgsql. Furthermore, one of the latest projects of mine was to replace the purely statistical approach with SVCs. Too bad I won't be able to devote any time to that project before september. On Mon, Apr 11, 2011 at 6:59 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: > There has been much talk lately about how to make > it smarter about that, but right now there's no general solution, > and workarounds can be tricky.
Dne 12.4.2011 09:33, Claudio Freire napsal(a): > I actually implemented a statistical system for measuring these kinds > of correlations. > > It's complex, but it might be adaptable to pgsql. Furthermore, one of > the latest projects of mine was to replace the purely statistical > approach with SVCs. You mean Support Vector Classifiers? Interesting idea, although I don't see how to apply that to query planning, especially with non-numeric inputs. Could you share more details on that statistical system and how do you think it could be applied in the pgsql world? > Too bad I won't be able to devote any time to that project before september. I've been working on cross column stats for some time, and although I had to put it aside for some time I'm going to devote more time to this issue soon. So interesting ideas/comments are very welcome. regards Tomas
On Wed, Apr 13, 2011 at 10:16 PM, Tomas Vondra <tv@fuzzy.cz> wrote: > You mean Support Vector Classifiers? Interesting idea, although I don't > see how to apply that to query planning, especially with non-numeric > inputs. Could you share more details on that statistical system and how > do you think it could be applied in the pgsql world? Well, in my case, the data was a simple list of attributes. You either had them or not, and data was very sparse, so the task was to fill the missing bits. For that, what I did is take a training set of data, and each time I wanted to know the likelihood of having a certain attribute I would compute the conditional probability given the training data - conditional on a set of other data. So, for postgres, if I had an index over a few columns of booleans (yea, bare with me) (a,b,c,d), and I wanted to know the selectivity of "where a", IF i already accounted for "where b" then I'd pick my training data and count how many of those that have b have also a. So P(a if b). Of course, my application had to handle thousands of attributes, so I couldn't apply conditional distributions on everything, I'd pick the conditional part (if b) to something that selected an "appropriately sized" sample from my training data. All that's very expensive. So I thought... what about replacing that with an SVC - train an SVC or SVR model for a, taking b, c, d as parameters. I never had the oportunity to test the idea, but the SVC variant would probably be usable by postgres, since all you need to know is b, c, d, they don't need to be booleans, or scalars in fact, SVCs are very flexible. Unknown values could easily be compensated for, with some cleverness. The tough part is, of course, training the SVC, picking the kind of SVC to use, and storing it into stats tables during analyze. Oh, and hoping it doesn't make fatal mistakes. I know, the idea is very green, but it would be a fun project - cough GSoC ;-)