Обсуждение: Re: cannot use multicolumn index
here is the explain analyze output-> server=# explain analyze select count(*) from temp_by_hour where xid > 100 and xdate > now() - interval '1 week'; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=29359311.58..29359311.59 rows=1 width=0) (actual time=2728061.589..2728061.590 rows=1 loops=1) -> Seq Scan on temp_by_hour (cost=0.00..29345234.14 rows=5630975 width=0) (actual time=560446.661..2726838.501 rows=5760724 loops=1) Filter: ((xid > 100) AND (xdate > (now() - '7 days'::interval))) Total runtime: 2728063.170 ms -- View this message in context: http://postgresql.1045698.n5.nabble.com/cannot-use-multicolumn-index-tp4802634p4802699.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
-postgres version -> 8.4.4 -os -> redhat 5.6 -specs ->24 cores, 96GB ram, shared_buffers=32 GB -postgresql.conf -> i havent made any changes as far as the query tuning parameters are concerned. #------------------------------------------------------------------------------ # QUERY TUNING #------------------------------------------------------------------------------ # - Planner Method Configuration - #enable_bitmapscan = on #enable_hashagg = on #enable_hashjoin = on #enable_indexscan = on #enable_mergejoin = on #enable_nestloop = on #enable_seqscan = on #enable_sort = on #enable_tidscan = on # - Planner Cost Constants - #seq_page_cost = 1.0 # measured on an arbitrary scale #random_page_cost = 4.0 # same scale as above #cpu_tuple_cost = 0.01 # same scale as above #cpu_index_tuple_cost = 0.005 # same scale as above #cpu_operator_cost = 0.0025 # same scale as above effective_cache_size = 50GB # - Genetic Query Optimizer - #geqo = on #geqo_threshold = 12 #geqo_effort = 5 # range 1-10 #geqo_pool_size = 0 # selects default based on effort #geqo_generations = 0 # selects default based on effort #geqo_selection_bias = 2.0 # range 1.5-2.0 # - Other Planner Options - #default_statistics_target = 100 # range 1-10000 #constraint_exclusion = partition # on, off, or partition #cursor_tuple_fraction = 0.1 # range 0.0-1.0 #from_collapse_limit = 8 #join_collapse_limit = 8 # 1 disables collapsing of explicit # JOIN clauses -------------------------------------------------------------------------------------------------------------- if any other parameters are relative to my question pls tell me which you want and i can post them (i can post the whole postgresql.conf if it's helpful). my shared buffers ar my question apart from the specific example, is a little more general. so, is it normal to expect such an index to be used? can i write the query in another form so as to use this index? is it for example that the conditions are '>' and not '=' a factor why the index is not used? -- View this message in context: http://postgresql.1045698.n5.nabble.com/cannot-use-multicolumn-index-tp4802634p4802871.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
On 14 Září 2011, 15:09, MirrorX wrote: > here is the explain analyze output-> > server=# explain analyze select count(*) from temp_by_hour where xid > 100 > and xdate > now() - interval '1 week'; > QUERY > PLAN > --------------------------------------------------------------------------------------------------------------------------------------------------- > Aggregate (cost=29359311.58..29359311.59 rows=1 width=0) (actual > time=2728061.589..2728061.590 rows=1 loops=1) > -> Seq Scan on temp_by_hour (cost=0.00..29345234.14 rows=5630975 > width=0) (actual time=560446.661..2726838.501 rows=5760724 loops=1) > Filter: ((xid > 100) AND (xdate > (now() - '7 days'::interval))) > Total runtime: 2728063.170 ms Sorry, but with this amount of information, no one can actually help. - What is the problem, i.e. what behaviour you expect? - How much data is the table? - What portion of it matches the conditions? - What is the index definition? My bet is the conditions are not selective enough and the index scan would be less effective than reading the whole table. Try to disable seqscan or modify the cost variables so that the index scan is used and see if it's faster or not. Tomas
thx for the answer. - What is the problem, i.e. what behaviour you expect? - How much data is the table? - What portion of it matches the conditions? - What is the index definition? i think in my first post i provided most of these details but -> 1) what i expect is to be able to understand why the index is not used and if possibly to use it somehow, or recreate it in a better way 2) the table has 115 GB and about 700 milion rows 3) the result should be less than 10 millions rows 4) the index is a btree i tried to disable seq_scan and the query plan was changed and used another index and not the one i wanted. -- View this message in context: http://postgresql.1045698.n5.nabble.com/cannot-use-multicolumn-index-tp4802634p4803198.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
14.09.11 18:14, MirrorX написав(ла): > i think in my first post i provided most of these details but -> > 1) what i expect is to be able to understand why the index is not used and > if possibly to use it somehow, or recreate it in a better way > 2) the table has 115 GB and about 700 milion rows > 3) the result should be less than 10 millions rows > 4) the index is a btree > > i tried to disable seq_scan and the query plan was changed and used another > index and not the one i wanted. You has ">" check on both columns, this means that it has to scan each subtree that satisfy one criteria to check against the other. Here index column order is significant. E.g. if you have a lot of xid > 100 and xid is first index column, it must check all (a lot) the index subtrees for xid>100. Multicolumn indexes work best when first columns are checked with "=" and only last column with range criteria. You may still try to change order of columns in your index if this will give best selectivity on first column. Another option is multiple single column indexes - postgres may merge such an indexes at runtime (don't remember since which version this feature is available). Best regards, Vitalii Tymchyshyn.
On 14 Září 2011, 17:14, MirrorX wrote: > thx for the answer. > > - What is the problem, i.e. what behaviour you expect? > - How much data is the table? > - What portion of it matches the conditions? > - What is the index definition? > > i think in my first post i provided most of these details but -> Hmmm, I haven't received that post and I don't see that in the archives: http://archives.postgresql.org/pgsql-performance/2011-09/msg00210.php It's displayed on nabble.com, but it's marked as 'not yet accepted'. That's strange. Anyway there's still a lot of missing info - what version of PostgreSQL is this? What is the table structure, what indexes are there? > 1) what i expect is to be able to understand why the index is not used and > if possibly to use it somehow, or recreate it in a better way > 2) the table has 115 GB and about 700 milion rows Really? Because the explain analyze output you posted states there are just 5.760.724 rows, not 700.000.000. > 3) the result should be less than 10 millions rows That's about 1.5% of the rows, but it may be much larger portion of the table. The table is stored by blocks - whenever you need to read a row, you need to read the whole block. 115GB is about 15.073.280 blocks (8kB). If each row happens to be stored in a different block, you'll have to read about 66% of blocks (although you need just 1.4% of rows). Sure, in reality the assumption 'a different block for each row' is not true, but with a table this large the block probably won't stay in the cache (and thus will be read repeatedly from the device). And that's just the table - you have to read the index too (which is 35GB in this case). So it's not just about the 'row selectivity', it's about 'block selectivity' too. In short - my guess is the seq scan will be more efficient in this case, but it's hard to prove without the necessary info. > 4) the index is a btree Great, but what are the columns? What data types are used? BTW I've noticed you stated this in the first post "i have read in the manual that the multicolumn index can be used only if the clauses of the query are in the same order as the columns of the index". That's not true since 8.1, so unless you're using a very old version of PostgreSQL (8.0 or older), you may use whatever columns you want although it's not as efficient. Do you need both columns (xid, xdate) in the WHERE condition, or have you used one of them just to fulfill the 'leftmost columns' rule by adding a condition that matches everything? If that's the case, it's hardly going to improve the effectivity. I see two possible solutions: 1) partition the table and use constraint_exclusion so that just a small portion of the table is scanned - there are pros/cons of this solution 2) cluster the table by one of the columns, so that an index scan may be more effective (but this might hurt other queries and you'll have to do that repeatedly) Tomas
thank you all for your advice. i will try the table partitioning approach to reduce the size of the tables and to be able to handle them more efficiently -- View this message in context: http://postgresql.1045698.n5.nabble.com/cannot-use-multicolumn-index-tp4802634p4806239.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com.