Обсуждение: Index ot being used
Hi all,
  I have an index on a table that doesn't seem to want to be used. I'm
hopig someone might be able to help point me in the right direction.
My index is (typed, not copied):
tle-bu=> \d file_info_7_display_idx;
 Index "public.file_info_7_display_idx"
     Column      |          Type
-----------------+----------------------
 file_type       | character varying(2)
 file_parent_dir | text
 file_name       | text
btree, for table "public.file_info_7"
tle-bu=> EXPLAIN ANALYZE SELECT file_type, file_parent_dir, file_name FROM
file_info_7;
                                        QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
 Seq Scan on file_info_7  (cost=0.00..11028.35 rows=294035 width=118)
(actual time=0.122..2707.764 rows=294035 loops=1)
 Total runtime: 3717.862 ms
(2 rows)
  Can anyone see what's wrong? Should I post the table schema? Thanks all!
Madison
			
		[linux@alteeve.com - Fri at 12:10:19PM -0400] > tle-bu=> EXPLAIN ANALYZE SELECT file_type, file_parent_dir, file_name FROM > file_info_7; > QUERY PLAN > ---------------------------------------------------------------------------------------------------------------------- > Seq Scan on file_info_7 (cost=0.00..11028.35 rows=294035 width=118) > (actual time=0.122..2707.764 rows=294035 loops=1) > Total runtime: 3717.862 ms > (2 rows) > As far as I can see, you are selecting everything from the table without any sort order. The only rational thing to do then is a sequential scan, it's no point in an index scan. -- Tobias Brox, +47-91700050
Hi, At 18:10 10/06/2005, linux@alteeve.com wrote: >tle-bu=> EXPLAIN ANALYZE SELECT file_type, file_parent_dir, file_name FROM >file_info_7; What could the index be used for? Unless you have some WHERE or (in some cases) ORDER BY clause, there's absolutely no need for an index, since you are just asking for all rows from the table... Jacques.
Tobias Brox wrote:
> [linux@alteeve.com - Fri at 12:10:19PM -0400]
>
>>tle-bu=> EXPLAIN ANALYZE SELECT file_type, file_parent_dir, file_name FROM
>>file_info_7;
>>                                        QUERY PLAN
>>----------------------------------------------------------------------------------------------------------------------
>> Seq Scan on file_info_7  (cost=0.00..11028.35 rows=294035 width=118)
>>(actual time=0.122..2707.764 rows=294035 loops=1)
>> Total runtime: 3717.862 ms
>>(2 rows)
>>
>
>
> As far as I can see, you are selecting everything from the table without any
> sort order.  The only rational thing to do then is a sequential scan, it's
> no point in an index scan.
>
   Thanks for replying, Tobias and Jacques!
   Doh! This is a case of over simplification, I think. I was trying to
simplify my query as much as I could and then work it out to the actual
query I want. It would seem I don't understand how to use indexes quite
right. Do you think you might be able to help me with a useful index?
   Here is the 'file_info_7' schema, my query and the 'explain analyze'
results:
tle-bu=> \d file_info_7
                               Table "public.file_info_7"
         Column        |         Type         |                Modifiers
----------------------+----------------------+-----------------------------------------
  file_group_name      | text                 |
  file_group_uid       | bigint               | not null
  file_mod_time        | bigint               | not null
  file_name            | text                 | not null
  file_parent_dir      | text                 | not null
  file_perm            | text                 | not null
  file_size            | bigint               | not null
  file_type            | character varying(2) | not null default
'f'::character varying
  file_user_name       | text                 |
  file_user_uid        | bigint               | not null
  file_backup          | boolean              | not null default true
  file_display         | boolean              | not null default false
  file_restore_display | boolean              | not null default false
  file_restore         | boolean              | not null default false
Indexes:
     "file_info_7_display_idx" btree (file_type, file_parent_dir, file_name)
   Here is my full query:
tle-bu=> EXPLAIN ANALYZE SELECT file_name, file_parent_dir, file_type
FROM file_info_7 WHERE file_type='d' ORDER BY file_parent_dir ASC,
file_name ASC;
                                                          QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
  Sort  (cost=14541.24..14603.48 rows=24895 width=118) (actual
time=15751.804..15967.591 rows=25795 loops=1)
    Sort Key: file_parent_dir, file_name
    ->  Seq Scan on file_info_7  (cost=0.00..11763.44 rows=24895
width=118) (actual time=19.289..3840.845 rows=25795 loops=1)
          Filter: ((file_type)::text = 'd'::text)
  Total runtime: 16043.075 ms
(5 rows)
   This is my index (which I guess is wrong):
tle-bu=> \d file_info_7_display_idx
  Index "public.file_info_7_display_idx"
      Column      |         Type
-----------------+----------------------
  file_type       | character varying(2)
  file_parent_dir | text
  file_name       | text
btree, for table "public.file_info_7"
   Those are the three columns I am using in my restrictions so I
thought that would create an index this query would use. Do I need to do
something different because of the 'ORDER BY...'?
   Thanks again for the replies!
Madison
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Madison Kelly (Digimer)
TLE-BU, The Linux Experience; Back Up
http://tle-bu.thelinuxexperience.com
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
			
		Madison Kelly <linux@alteeve.com> writes:
>    Here is my full query:
> tle-bu=> EXPLAIN ANALYZE SELECT file_name, file_parent_dir, file_type
> FROM file_info_7 WHERE file_type='d' ORDER BY file_parent_dir ASC,
> file_name ASC;
>    This is my index (which I guess is wrong):
> tle-bu=> \d file_info_7_display_idx
>   Index "public.file_info_7_display_idx"
>       Column      |         Type
> -----------------+----------------------
>   file_type       | character varying(2)
>   file_parent_dir | text
>   file_name       | text
> btree, for table "public.file_info_7"
The index is fine, but you need to phrase the query as
    ... ORDER BY file_type, file_parent_dir, file_name;
(Whether you use ASC or not doesn't matter.)  Otherwise the planner
won't make the connection to the sort ordering of the index.
            regards, tom lane
			
		On Sun, Jun 12, 2005 at 10:12:27 -0400, Madison Kelly <linux@alteeve.com> wrote: > Indexes: > "file_info_7_display_idx" btree (file_type, file_parent_dir, file_name) > Here is my full query: > > tle-bu=> EXPLAIN ANALYZE SELECT file_name, file_parent_dir, file_type > FROM file_info_7 WHERE file_type='d' ORDER BY file_parent_dir ASC, > file_name ASC; > QUERY PLAN > ----------------------------------------------------------------------------------------------------------------------------- This is a case where postgres's planner can't make a deduction needed for it to realize that the index can be used. Try rewriting the query as: SELECT file_name, file_parent_dir, file_type FROM file_info_7 WHERE file_type='d' ORDER BY file_type ASC, file_parent_dir ASC, file_name ASC;
Tom Lane wrote:
> Madison Kelly <linux@alteeve.com> writes:
>
>>   Here is my full query:
>
>
>>tle-bu=> EXPLAIN ANALYZE SELECT file_name, file_parent_dir, file_type
>>FROM file_info_7 WHERE file_type='d' ORDER BY file_parent_dir ASC,
>>file_name ASC;
>
>
>>   This is my index (which I guess is wrong):
>
>
>>tle-bu=> \d file_info_7_display_idx
>>  Index "public.file_info_7_display_idx"
>>      Column      |         Type
>>-----------------+----------------------
>>  file_type       | character varying(2)
>>  file_parent_dir | text
>>  file_name       | text
>>btree, for table "public.file_info_7"
>
>
> The index is fine, but you need to phrase the query as
>
>     ... ORDER BY file_type, file_parent_dir, file_name;
>
> (Whether you use ASC or not doesn't matter.)  Otherwise the planner
> won't make the connection to the sort ordering of the index.
>
>             regards, tom lane
Hi Tom and Bruno,
   After sending that email I kept plucking away and in the course of
doing so decided that I didn't need to return the 'file_type' column.
Other than that, it would see my query now matches what you two have
recommended in the 'ORDER BY...' front but I still can't get an index
search.
   Here is the latest query and the new index:
tle-bu=> \d file_info_7_display_idx;
Index "public.file_info_7_display_idx"
      Column      | Type
-----------------+------
  file_parent_dir | text
  file_name       | text
btree, for table "public.file_info_7"
tle-bu=> EXPLAIN ANALYZE SELECT file_parent_dir, file_name, file_display
FROM file_info_7 WHERE file_type='d' ORDER BY file_parent_dir ASC,
file_name ASC;
                                                          QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
  Sort  (cost=14509.53..14571.76 rows=24895 width=114) (actual
time=19995.250..20123.874 rows=25795 loops=1)
    Sort Key: file_parent_dir, file_name
    ->  Seq Scan on file_info_7  (cost=0.00..11762.44 rows=24895
width=114) (actual time=0.123..3228.446 rows=25795 loops=1)
          Filter: ((file_type)::text = 'd'::text)
  Total runtime: 20213.443 ms
   The 'Sort' is taking 20 seconds on my pentium III 1GHz (not great,
but...). If I follow you right, my index is 'file_parent_dir' first and
'file_name' second (does order matter?). So I figured the query:
SELECT file_parent_dir, file_name, file_display
FROM file_info_7
WHERE file_type='d'
ORDER BY file_parent_dir ASC, file_name ASC;
   Would hit the index for the sort. Is there any other way other than
'EXPLAIN ANALYZE...' to get a better understanding of what is happening
in there? For what it's worth, there is a little under 300,000 entries
in this table of which, as you can see above, 25,795 are being returned.
   Yet again, thank you both!! I'm off to keep trying to figure this out...
Madison
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Madison Kelly (Digimer)
TLE-BU, The Linux Experience; Back Up
http://tle-bu.thelinuxexperience.com
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
			
		On Sun, Jun 12, 2005 at 18:52:05 -0400, Madison Kelly <linux@alteeve.com> wrote: > > After sending that email I kept plucking away and in the course of > doing so decided that I didn't need to return the 'file_type' column. > Other than that, it would see my query now matches what you two have > recommended in the 'ORDER BY...' front but I still can't get an index > search. No it doesn't. Even if you don't return file_type you still need it in the order by clause if you want postgres to consider using your index. Is there some reason you didn't actually try out our suggestion, but are now asking for more advice? > > Here is the latest query and the new index: > > tle-bu=> \d file_info_7_display_idx; > Index "public.file_info_7_display_idx" > Column | Type > -----------------+------ > file_parent_dir | text > file_name | text > btree, for table "public.file_info_7" > > tle-bu=> EXPLAIN ANALYZE SELECT file_parent_dir, file_name, file_display > FROM file_info_7 WHERE file_type='d' ORDER BY file_parent_dir ASC, > file_name ASC;
On Sun, Jun 12, 2005 at 22:00:01 -0500, Bruno Wolff III <bruno@wolff.to> wrote: > On Sun, Jun 12, 2005 at 18:52:05 -0400, > Madison Kelly <linux@alteeve.com> wrote: > > > > After sending that email I kept plucking away and in the course of > > doing so decided that I didn't need to return the 'file_type' column. > > Other than that, it would see my query now matches what you two have > > recommended in the 'ORDER BY...' front but I still can't get an index > > search. > > No it doesn't. Even if you don't return file_type you still need it > in the order by clause if you want postgres to consider using your > index. I didn't notice that you had changed the index. The reason this index doesn't help is that you can't use it to select on records with the desired file_type. > > Is there some reason you didn't actually try out our suggestion, but are > now asking for more advice? > > > > > Here is the latest query and the new index: > > > > tle-bu=> \d file_info_7_display_idx; > > Index "public.file_info_7_display_idx" > > Column | Type > > -----------------+------ > > file_parent_dir | text > > file_name | text > > btree, for table "public.file_info_7" > > > > tle-bu=> EXPLAIN ANALYZE SELECT file_parent_dir, file_name, file_display > > FROM file_info_7 WHERE file_type='d' ORDER BY file_parent_dir ASC, > > file_name ASC; > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq
Bruno Wolff III wrote:
> On Sun, Jun 12, 2005 at 18:52:05 -0400,
>   Madison Kelly <linux@alteeve.com> wrote:
>
>>  After sending that email I kept plucking away and in the course of
>>doing so decided that I didn't need to return the 'file_type' column.
>>Other than that, it would see my query now matches what you two have
>>recommended in the 'ORDER BY...' front but I still can't get an index
>>search.
>
>
> No it doesn't. Even if you don't return file_type you still need it
> in the order by clause if you want postgres to consider using your
> index.
>
> Is there some reason you didn't actually try out our suggestion, but are
> now asking for more advice?
No good excuse.
I'll recreate the index and test out your suggestion...
tle-bu=> EXPLAIN ANALYZE SELECT file_name, file_parent_dir, file_type
FROM file_info_7 WHERE file_type='d' ORDER BY file_type ASC,
file_parent_dir ASC, file_name ASC;
                                                          QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
  Sort  (cost=14789.92..14857.06 rows=26856 width=117) (actual
time=16865.473..16989.104 rows=25795 loops=1)
    Sort Key: file_type, file_parent_dir, file_name
    ->  Seq Scan on file_info_7  (cost=0.00..11762.44 rows=26856
width=117) (actual time=0.178..1920.413 rows=25795 loops=1)
          Filter: ((file_type)::text = 'd'::text)
  Total runtime: 17102.925 ms
(5 rows)
tle-bu=> \d file_info_7_display_idx  Index "public.file_info_7_display_idx"
      Column      |         Type
-----------------+----------------------
  file_type       | character varying(2)
  file_parent_dir | text
  file_name       | text
btree, for table "public.file_info_7"
   I'm still getting the sequential scan.
Madison
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Madison Kelly (Digimer)
TLE-BU, The Linux Experience; Back Up
http://tle-bu.thelinuxexperience.com
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
			
		Bruno Wolff III wrote: > On Sun, Jun 12, 2005 at 22:00:01 -0500, > Bruno Wolff III <bruno@wolff.to> wrote: > >>On Sun, Jun 12, 2005 at 18:52:05 -0400, >> Madison Kelly <linux@alteeve.com> wrote: >> >>> After sending that email I kept plucking away and in the course of >>>doing so decided that I didn't need to return the 'file_type' column. >>>Other than that, it would see my query now matches what you two have >>>recommended in the 'ORDER BY...' front but I still can't get an index >>>search. >> >>No it doesn't. Even if you don't return file_type you still need it >>in the order by clause if you want postgres to consider using your >>index. > > > I didn't notice that you had changed the index. The reason this index > doesn't help is that you can't use it to select on records with the > desired file_type. As you probably saw in my last reply, I went back to the old index and tried the query you and Tom Lane recommended. Should this not have caught the index? At any rate, I am re-reading the documents on indexing for 7.4.x on postgresql.org... This is kind of flustering. Thanks again though for som much help! Madison -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Madison Kelly (Digimer) TLE-BU, The Linux Experience; Back Up http://tle-bu.thelinuxexperience.com -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
On Sun, Jun 12, 2005 at 23:42:05 -0400, Madison Kelly <linux@alteeve.com> wrote: > > As you probably saw in my last reply, I went back to the old index and > tried the query you and Tom Lane recommended. Should this not have > caught the index? Probably, but there might be some other reason the planner thought it was better to not use it. Using indexes is not always faster. It would help to see your latest definition of the table and indexes, the exact query you used and explain analyze output.
Bruno Wolff III wrote:
> On Sun, Jun 12, 2005 at 23:42:05 -0400,
>   Madison Kelly <linux@alteeve.com> wrote:
>
>>As you probably saw in my last reply, I went back to the old index and
>>tried the query you and Tom Lane recommended. Should this not have
>>caught the index?
>
>
> Probably, but there might be some other reason the planner thought it
> was better to not use it. Using indexes is not always faster.
>
> It would help to see your latest definition of the table and indexes,
> the exact query you used and explain analyze output.
>
Okay, here's what I have at the moment:
tle-bu=> \d file_info_7                               Table
"public.file_info_7"
         Column        |         Type         |                Modifiers
----------------------+----------------------+-----------------------------------------
  file_group_name      | text                 |
  file_group_uid       | bigint               | not null
  file_mod_time        | bigint               | not null
  file_name            | text                 | not null
  file_parent_dir      | text                 | not null
  file_perm            | text                 | not null
  file_size            | bigint               | not null
  file_type            | character varying(2) | not null default
'f'::character varying
  file_user_name       | text                 |
  file_user_uid        | bigint               | not null
  file_backup          | boolean              | not null default true
  file_display         | boolean              | not null default false
  file_restore_display | boolean              | not null default false
  file_restore         | boolean              | not null default false
Indexes:
     "file_info_7_display_idx" btree (file_parent_dir, file_name)
tle-bu=> \d file_info_7_display_idx
Index "public.file_info_7_display_idx"
      Column      | Type
-----------------+------
  file_parent_dir | text
  file_name       | text
btree, for table "public.file_info_7"
tle-bu=> EXPLAIN ANALYZE SELECT file_name, file_parent_dir, file_display
FROM file_info_7 WHERE file_type='d' ORDER BY file_parent_dir ASC,
file_name ASC;
                                                          QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
  Sort  (cost=15091.53..15165.29 rows=29502 width=114) (actual
time=12834.933..12955.136 rows=25795 loops=1)
    Sort Key: file_parent_dir, file_name
    ->  Seq Scan on file_info_7  (cost=0.00..11762.44 rows=29502
width=114) (actual time=0.244..2533.388 rows=25795 loops=1)
          Filter: ((file_type)::text = 'd'::text)
  Total runtime: 13042.421 ms
(5 rows)
   Since my last post I went back to a query closer to what I actually
want. What is most important to me is that 'file_parent_dir, file_name,
file_display' are returned and that the results are sorted by
'file_parent_dir, file_name' and the results are restricted to where
'file_info='d''.
   Basically what I am trying to do is display a directory tree in a
file browser. I had this working before but it was far, far too slow
once the number of directories to display got much higher than 1,000.
That is what 'file_display' is, by the way.
   Again, thank you!
Madison
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Madison Kelly (Digimer)
TLE-BU, The Linux Experience; Back Up
http://tle-bu.thelinuxexperience.com
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
			
		On Mon, Jun 13, 2005 at 00:29:08 -0400, Madison Kelly <linux@alteeve.com> wrote: > Bruno Wolff III wrote: > >On Sun, Jun 12, 2005 at 23:42:05 -0400, > > Madison Kelly <linux@alteeve.com> wrote: > > > >>As you probably saw in my last reply, I went back to the old index and > >>tried the query you and Tom Lane recommended. Should this not have > >>caught the index? > > > > > >Probably, but there might be some other reason the planner thought it > >was better to not use it. Using indexes is not always faster. > > > >It would help to see your latest definition of the table and indexes, > >the exact query you used and explain analyze output. > > > > Okay, here's what I have at the moment: > > tle-bu=> \d file_info_7 Table > "public.file_info_7" > Column | Type | Modifiers > ----------------------+----------------------+----------------------------------------- > file_group_name | text | > file_group_uid | bigint | not null > file_mod_time | bigint | not null > file_name | text | not null > file_parent_dir | text | not null > file_perm | text | not null > file_size | bigint | not null > file_type | character varying(2) | not null default > 'f'::character varying > file_user_name | text | > file_user_uid | bigint | not null > file_backup | boolean | not null default true > file_display | boolean | not null default false > file_restore_display | boolean | not null default false > file_restore | boolean | not null default false > Indexes: > "file_info_7_display_idx" btree (file_parent_dir, file_name) > > > tle-bu=> \d file_info_7_display_idx > Index "public.file_info_7_display_idx" > Column | Type > -----------------+------ > file_parent_dir | text > file_name | text > btree, for table "public.file_info_7" > > > tle-bu=> EXPLAIN ANALYZE SELECT file_name, file_parent_dir, file_display > FROM file_info_7 WHERE file_type='d' ORDER BY file_parent_dir ASC, > file_name ASC; > QUERY PLAN > ---------------------------------------------------------------------------------------------------------------------------- > Sort (cost=15091.53..15165.29 rows=29502 width=114) (actual > time=12834.933..12955.136 rows=25795 loops=1) > Sort Key: file_parent_dir, file_name > -> Seq Scan on file_info_7 (cost=0.00..11762.44 rows=29502 > width=114) (actual time=0.244..2533.388 rows=25795 loops=1) > Filter: ((file_type)::text = 'd'::text) > Total runtime: 13042.421 ms > (5 rows) > > > Since my last post I went back to a query closer to what I actually > want. What is most important to me is that 'file_parent_dir, file_name, > file_display' are returned and that the results are sorted by > 'file_parent_dir, file_name' and the results are restricted to where > 'file_info='d''. I am guessing you mean 'file_type' instead of 'file_info'. To do this efficiently you want an index on (file_type, file_parent_dir, file_name). Currently you only have an index on (file_parent_dir, file_name) which won't help for this query. You also need to order by file_type even though it will be constant for all of the returned rows in order to help out the planner. This will allow an index scan over the desired rows that returns them in the desired order. Please actually try this before changing anything else.
It sure would be nice if the optimizer would consider that it had the
leeway to add any column which was restricted to a single value to any
point in the ORDER BY clause.  Without that, the application programmer
has to know what indexes are on the table, rather than being able to
just worry about the set of data they want.  Obviously, if a column can
have only one value in the result set, adding to any point in the ORDER
BY can't change anything but performance.  That sure sounds like
something which should fall within the scope of an optimizer.
It really should be a DBA function to add or drop indexes to tune the
performance of queries, without requiring application programmers to
modify the queries for every DBA adjustment.  (When you have a database
with over 350 tables and thousands of queries, you really begin to
appreciate the importance of this.)
>>> Tom Lane <tgl@sss.pgh.pa.us> 06/12/05 10:56 AM >>>
Madison Kelly <linux@alteeve.com> writes:
>    Here is my full query:
> tle-bu=> EXPLAIN ANALYZE SELECT file_name, file_parent_dir, file_type
> FROM file_info_7 WHERE file_type='d' ORDER BY file_parent_dir ASC,
> file_name ASC;
>    This is my index (which I guess is wrong):
> tle-bu=> \d file_info_7_display_idx
>   Index "public.file_info_7_display_idx"
>       Column      |         Type
> -----------------+----------------------
>   file_type       | character varying(2)
>   file_parent_dir | text
>   file_name       | text
> btree, for table "public.file_info_7"
The index is fine, but you need to phrase the query as
    ... ORDER BY file_type, file_parent_dir, file_name;
(Whether you use ASC or not doesn't matter.)  Otherwise the planner
won't make the connection to the sort ordering of the index.
            regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if
your
      joining column's datatypes do not match
			
		Kevin Grittner wrote: >It sure would be nice if the optimizer would consider that it had the >leeway to add any column which was restricted to a single value to any >point in the ORDER BY clause. Without that, the application programmer >has to know what indexes are on the table, rather than being able to >just worry about the set of data they want. Obviously, if a column can >have only one value in the result set, adding to any point in the ORDER >BY can't change anything but performance. That sure sounds like >something which should fall within the scope of an optimizer. > >It really should be a DBA function to add or drop indexes to tune the >performance of queries, without requiring application programmers to >modify the queries for every DBA adjustment. (When you have a database >with over 350 tables and thousands of queries, you really begin to >appreciate the importance of this.) > > I agree that having a smarter optimizer, which can recognize when an index can be used for ORDER BY would be useful. I don't know if there are specific reasons why not, other than just not being implemented yet. It might be tricky to get it correct (for instance, how do you know which columns can be added, which ones will be constant) Perhaps you could just potentially add the WHERE items if they have an equality constraint with a constant. But I'm guessing there are more cases than that where the optimization could be performed. Also, the more options you give the planner, the longer it takes on average to plan any single query. Yes, it is beneficial for this use case, but does that balance out slowing down all the other queries by a tiny bit. I'm guessing the optimization wasn't as important as some of the others that have been done, so it hasn't been implemented yet. John =:->
Вложения
On Mon, Jun 13, 2005 at 09:51:57 -0500, John A Meinel <john@arbash-meinel.com> wrote: > > I don't know if there are specific reasons why not, other than just not > being implemented yet. It might be tricky to get it correct (for > instance, how do you know which columns can be added, which ones will be > constant) Perhaps you could just potentially add the WHERE items if they > have an equality constraint with a constant. But I'm guessing there are > more cases than that where the optimization could be performed. I think there is already some intelligence about which expressions are constant in particular parts of a plan. I think you need to be able to do two things. One is to drop constant expressions from order by lists. The other is when looking for an index to produce a specific ordering, to ingore leading constant expressions when comparing to the order by expressions. > Also, the more options you give the planner, the longer it takes on > average to plan any single query. Yes, it is beneficial for this use > case, but does that balance out slowing down all the other queries by a > tiny bit. But there aren't that many possible indexes, so I don't expect this will slow things down much more than the current check for potentially useful indexes.
>   John A Meinel <john@arbash-meinel.com> wrote:
>> I don't know if there are specific reasons why not, other than just not
>> being implemented yet. It might be tricky to get it correct
Not so much tricky to get correct, as potentially expensive to test for;
it'd be quite easy to waste a lot of cycles trying to match ORDER BY
keys in multiple ways to completely-irrelevant indexes.  Since this
will only be helpful for a minority of queries but the costs would be
paid on almost everything with an ORDER BY, that consideration has been
looming large in my mind.
Bruno Wolff III <bruno@wolff.to> writes:
> I think you need to be able to do two things. One is to drop constant
> expressions from order by lists. The other is when looking for an index
> to produce a specific ordering, to ingore leading constant expressions
> when comparing to the order by expressions.
I've been thinking about this some more this morning, and I think I see
how it could be relatively inexpensive to recognize x=constant
restrictions that allow ordering columns of an index to be ignored.  We
are already doing 90% of the work for that just as a byproduct of trying
to match the x=constant clause to the index in the first place, so it's
mostly a matter of refactoring the code to allow that work to be reused.
I don't, however, see an equally inexpensive way to ignore ORDER BY
columns.  That would imply associating the '=' operator of the
restriction clause with the '<' or '>' operator of the ORDER BY clause,
which means searching for a btree opclass that has them in common, which
is not cheap since there's no indexing on pg_amop that would allow us to
find it easily.  (There are various places where we do in fact do that
sort of thing, but they aren't so performance-critical.)  This doesn't
come up in the other case because we already know the relevant opclass
from the index.
I don't think the use-case has been shown that justifies doing this much
work to ignore useless ORDER BY clauses.  The examples that have come up
in the past all suggest ignoring index columns not the other way 'round.
Can you make a case that we need to do that part of it?
            regards, tom lane
			
		I agree that ignoring useless columns in an ORDER BY clause is less important than ignoring index columns where the value is fixed. There is one use case for ignoring useless ORDER BY columns that leaps to mind, however -- a column is added to the ORDER BY clause of a query to help out the optimizer, then the indexes are modified such that that column is no longer useful. Whether this merits the programming effort and performance hit you describe seems highly questionable, though. -Kevin >>> Tom Lane <tgl@sss.pgh.pa.us> 06/13/05 11:22 AM >>> I don't think the use-case has been shown that justifies doing this much work to ignore useless ORDER BY clauses. The examples that have come up in the past all suggest ignoring index columns not the other way 'round. Can you make a case that we need to do that part of it?
On Mon, Jun 13, 2005 at 12:22:14 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > I don't think the use-case has been shown that justifies doing this much > work to ignore useless ORDER BY clauses. The examples that have come up > in the past all suggest ignoring index columns not the other way 'round. > Can you make a case that we need to do that part of it? I don't think so. I don't think people are likely to order by constant expressions except by adding them to the front to help optimization. When I was thinking about this I was looking at what equivalences could be used and didn't look back to see which ones would be useful in the normal case. And I think it is a lot more likely people will leave out columns they know not to be relevant than to include them.
On Mon, Jun 13, 2005 at 11:46:46 -0500, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: > I agree that ignoring useless columns in an ORDER BY clause is less > important than ignoring index columns where the value is fixed. There > is one use case for ignoring useless ORDER BY columns that leaps to > mind, however -- a column is added to the ORDER BY clause of a query to > help out the optimizer, then the indexes are modified such that that > column is no longer useful. Whether this merits the programming effort > and performance hit you describe seems highly questionable, though. I suspect that this isn't a big deal. There was a question like that that has been going back and forth over the last couple of days. If you remove the constant expression from the index, you aren't likely going to use the index anyway, but will instead sort the output rows from either a sequential scan or an index scan based on an index that does use the constant expression.
Bruno Wolff III wrote:
> I am guessing you mean 'file_type' instead of 'file_info'.
>
> To do this efficiently you want an index on (file_type, file_parent_dir,
> file_name). Currently you only have an index on (file_parent_dir, file_name)
> which won't help for this query. You also need to order by file_type
> even though it will be constant for all of the returned rows in order
> to help out the planner. This will allow an index scan over the desired
> rows that returns them in the desired order.
>
> Please actually try this before changing anything else.
   If I follow then I tried it but still got the sequential scan. Here's
the index and query (copied from the 'psql' shell):
tle-bu=> \d file_info_7_display_idx  Index "public.file_info_7_display_idx"
      Column      |         Type
-----------------+----------------------
  file_type       | character varying(2)
  file_parent_dir | text
  file_name       | text
btree, for table "public.file_info_7"
tle-bu=> EXPLAIN ANALYZE SELECT file_name, file_parent_dir, file_display
FROM file_info_7 WHERE file_type='d' ORDER BY file_type ASC,
file_parent_dir ASC, file_name ASC;
                                                          QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
  Sort  (cost=14810.92..14874.65 rows=25490 width=119) (actual
time=15523.767..15731.136 rows=25795 loops=1)
    Sort Key: file_type, file_parent_dir, file_name
    ->  Seq Scan on file_info_7  (cost=0.00..11956.84 rows=25490
width=119) (actual time=0.132..2164.757 rows=25795 loops=1)
          Filter: ((file_type)::text = 'd'::text)
  Total runtime: 15884.188 ms
(5 rows)
   If I follow all three 'ORDER BY...' items match the three columns in
the index.
   Again, thanks!
Madison
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Madison Kelly (Digimer)
TLE-BU, The Linux Experience; Back Up
http://tle-bu.thelinuxexperience.com
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
			
		Kevin Grittner wrote: >>tle-bu=> \d file_info_7_display_idx >> Index "public.file_info_7_display_idx" >> Column | Type >>-----------------+---------------------- >> file_type | character varying(2) >> file_parent_dir | text >> file_name | text >>btree, for table "public.file_info_7" > > > The index is fine, but you need to phrase the query as > > ... ORDER BY file_type, file_parent_dir, file_name; > > (Whether you use ASC or not doesn't matter.) Otherwise the planner > won't make the connection to the sort ordering of the index. > > regards, tom lane With Bruno's help I've gone back and tried just this with no luck. I've re-written the query to include all three items in the 'ORDER BY...' column in the same order but the sort still takes a long time and a sequential scan is being done instead of using the index. For what it's worth, and being somewhat of a n00b, I agree with the idea of a smarter, more flexible planner. I guess the trade off is the added overhead neaded versus the size of the average query. Madison -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Madison Kelly (Digimer) TLE-BU, The Linux Experience; Back Up http://tle-bu.thelinuxexperience.com -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Madison Kelly <linux@alteeve.com> writes:
> Bruno Wolff III wrote:
>> Please actually try this before changing anything else.
>    If I follow then I tried it but still got the sequential scan.
Given the fairly large number of rows being selected, it seems likely
that the planner thinks this is faster than an indexscan.  It could
be right, too.  Have you tried "set enable_seqscan = off" to see if
the index is used then?  If so, is it faster or slower?  Comparing
EXPLAIN ANALYZE results with enable_seqscan on and off would be useful.
            regards, tom lane
			
		Tom Lane wrote:
> Madison Kelly <linux@alteeve.com> writes:
>
>>Bruno Wolff III wrote:
>>
>>>Please actually try this before changing anything else.
>
>
>>   If I follow then I tried it but still got the sequential scan.
>
>
> Given the fairly large number of rows being selected, it seems likely
> that the planner thinks this is faster than an indexscan.  It could
> be right, too.  Have you tried "set enable_seqscan = off" to see if
> the index is used then?  If so, is it faster or slower?  Comparing
> EXPLAIN ANALYZE results with enable_seqscan on and off would be useful.
Wow!
With the sequence scan off my query took less than 2sec. When I turned
it back on the time jumped back up to just under 14sec.
tle-bu=> set enable_seqscan = off; SET
tle-bu=> EXPLAIN ANALYZE SELECT file_name, file_parent_dir, file_display
FROM file_info_7 WHERE file_type='d' ORDER BY file_type ASC,
file_parent_dir ASC, file_name ASC;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------
  Index Scan using file_info_7_display_idx on file_info_7
(cost=0.00..83171.78 rows=25490 width=119) (actual
time=141.405..1700.459 rows=25795 loops=1)
    Index Cond: ((file_type)::text = 'd'::text)
  Total runtime: 1851.366 ms
(3 rows)
tle-bu=> set enable_seqscan = on; SET
tle-bu=> EXPLAIN ANALYZE SELECT file_name, file_parent_dir, file_display
FROM file_info_7 WHERE file_type='d' ORDER BY file_type ASC,
file_parent_dir ASC, file_name ASC;
                                                          QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
  Sort  (cost=14810.92..14874.65 rows=25490 width=119) (actual
time=13605.185..13728.436 rows=25795 loops=1)
    Sort Key: file_type, file_parent_dir, file_name
    ->  Seq Scan on file_info_7  (cost=0.00..11956.84 rows=25490
width=119) (actual time=0.048..2018.996 rows=25795 loops=1)
          Filter: ((file_type)::text = 'd'::text)
  Total runtime: 13865.830 ms
(5 rows)
   So the index obiously provides a major performance boost! I just need
to figure out how to tell the planner how to use it...
Madison
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Madison Kelly (Digimer)
TLE-BU, The Linux Experience; Back Up
http://tle-bu.thelinuxexperience.com
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
			
		On Mon, Jun 13, 2005 at 15:05:00 -0400, Madison Kelly <linux@alteeve.com> wrote: > Wow! > > With the sequence scan off my query took less than 2sec. When I turned > it back on the time jumped back up to just under 14sec. > > > tle-bu=> set enable_seqscan = off; SET > tle-bu=> EXPLAIN ANALYZE SELECT file_name, file_parent_dir, file_display > FROM file_info_7 WHERE file_type='d' ORDER BY file_type ASC, > file_parent_dir ASC, file_name ASC; > > QUERY PLAN > -------------------------------------------------------------------------------------------------------------------------------------------------------- > Index Scan using file_info_7_display_idx on file_info_7 > (cost=0.00..83171.78 rows=25490 width=119) (actual > time=141.405..1700.459 rows=25795 loops=1) > Index Cond: ((file_type)::text = 'd'::text) > Total runtime: 1851.366 ms > (3 rows) > > > tle-bu=> set enable_seqscan = on; SET > tle-bu=> EXPLAIN ANALYZE SELECT file_name, file_parent_dir, file_display > FROM file_info_7 WHERE file_type='d' ORDER BY file_type ASC, > file_parent_dir ASC, file_name ASC; > QUERY PLAN > ---------------------------------------------------------------------------------------------------------------------------- > Sort (cost=14810.92..14874.65 rows=25490 width=119) (actual > time=13605.185..13728.436 rows=25795 loops=1) > Sort Key: file_type, file_parent_dir, file_name > -> Seq Scan on file_info_7 (cost=0.00..11956.84 rows=25490 > width=119) (actual time=0.048..2018.996 rows=25795 loops=1) > Filter: ((file_type)::text = 'd'::text) > Total runtime: 13865.830 ms > (5 rows) > > So the index obiously provides a major performance boost! I just need > to figure out how to tell the planner how to use it... The two things you probably want to look at are (in postgresql.conf): effective_cache_size = 10000 # typically 8KB each random_page_cost = 2 # units are one sequential page fetch cost Increasing effective cache size and decreasing the penalty for random disk fetches will favor using index scans. People have reported that dropping random_page_cost from the default of 4 to 2 works well. Effective cache size should be set to some reasonable estimate of the memory available on your system to postgres, not counting that set aside for shared buffers. However, since the planner thought the index scan plan was going to be 6 times slower than the sequential scan plan, I don't know if tweaking these values enough to switch the plan choice won't cause problems for other queries.
Madison Kelly <linux@alteeve.com> writes: > So the index obiously provides a major performance boost! I just need to > figure out how to tell the planner how to use it... Be careful extrapolating too much from a single query in a single context. Notably you might want to test the same query after not touching this table for a little while. The index is probably benefiting disproportionately from having you repeatedly running this one query and having the entire table in cache. That said, you should look at lowering random_page_cost. The default is 4 but if this query is representative of your system's performance then much of your database is in cache and the effective value will be closer to 1. Try 2 or even 1.5 or 1.2. But like I said, test other queries and test under more representative conditions other than repeating a single query over and over. -- greg
Madison Kelly <linux@alteeve.com> writes:
>    So the index obiously provides a major performance boost! I just need
> to figure out how to tell the planner how to use it...
Simple division shows that the planner's cost estimate ratio between the
seqscan and the indexscan (11956.84 vs 83171.78) is off by a factor of
more than 8 compared to reality (2018.996 vs 1700.459).  Also the cost of
the sort seems to be drastically underestimated.
I suspect this may be a combination of random_page_cost being too high
(since your test case, at least, is no doubt fully cached in RAM) and
cpu_operator_cost being too low.  I'm wondering if text comparisons
are really slow on your machine --- possibly due to strcoll being
inefficient in the locale you are using, which you didn't say.  That
would account for both the seqscan being slower than expected and the
sort taking a long time.
It'd be interesting to look at the actual runtimes of this seqscan vs
one that is doing a simple integer comparison over the same number of
rows (and, preferably, returning about the same number of rows as this).
            regards, tom lane
			
		Bruno Wolff III wrote:
> On Mon, Jun 13, 2005 at 15:05:00 -0400,
>   Madison Kelly <linux@alteeve.com> wrote:
>
>>Wow!
>>
>>With the sequence scan off my query took less than 2sec. When I turned
>>it back on the time jumped back up to just under 14sec.
>>
>>
>>tle-bu=> set enable_seqscan = off; SET
>>tle-bu=> EXPLAIN ANALYZE SELECT file_name, file_parent_dir, file_display
>>FROM file_info_7 WHERE file_type='d' ORDER BY file_type ASC,
>>file_parent_dir ASC, file_name ASC;
>>
>>QUERY PLAN
>>--------------------------------------------------------------------------------------------------------------------------------------------------------
>> Index Scan using file_info_7_display_idx on file_info_7
>>(cost=0.00..83171.78 rows=25490 width=119) (actual
>>time=141.405..1700.459 rows=25795 loops=1)
>>   Index Cond: ((file_type)::text = 'd'::text)
>> Total runtime: 1851.366 ms
>>(3 rows)
>>
>>
>>tle-bu=> set enable_seqscan = on; SET
>>tle-bu=> EXPLAIN ANALYZE SELECT file_name, file_parent_dir, file_display
>>FROM file_info_7 WHERE file_type='d' ORDER BY file_type ASC,
>>file_parent_dir ASC, file_name ASC;
>>                                                         QUERY PLAN
>>----------------------------------------------------------------------------------------------------------------------------
>> Sort  (cost=14810.92..14874.65 rows=25490 width=119) (actual
>>time=13605.185..13728.436 rows=25795 loops=1)
>>   Sort Key: file_type, file_parent_dir, file_name
>>   ->  Seq Scan on file_info_7  (cost=0.00..11956.84 rows=25490
>>width=119) (actual time=0.048..2018.996 rows=25795 loops=1)
>>         Filter: ((file_type)::text = 'd'::text)
>> Total runtime: 13865.830 ms
>>(5 rows)
>>
>>  So the index obiously provides a major performance boost! I just need
>>to figure out how to tell the planner how to use it...
>
>
> The two things you probably want to look at are (in postgresql.conf):
> effective_cache_size = 10000    # typically 8KB each
> random_page_cost = 2            # units are one sequential page fetch cost
>
> Increasing effective cache size and decreasing the penalty for random
> disk fetches will favor using index scans. People have reported that
> dropping random_page_cost from the default of 4 to 2 works well.
> Effective cache size should be set to some reasonable estimate of
> the memory available on your system to postgres, not counting that
> set aside for shared buffers.
>
> However, since the planner thought the index scan plan was going to be 6 times
> slower than the sequential scan plan, I don't know if tweaking these values
> enough to switch the plan choice won't cause problems for other queries.
Hmm,
   In this case I am trying to avoid modifying 'postgres.conf' and am
trying to handle any performance tweaks within my program through SQL
calls. This is because (I hope) my program will be installed by many
users and I don't want to expect them to be able/comfortable playing
with 'postgres.conf'. I do plan later though to create a section in the
docs with extra tweaks for more advanced users and in that case I will
come back to this and try/record just that.
   In the mean time Tom's recommendation works from perl by calling:
$DB->do("SET ENABLE_SEQSCAN TO OFF") || die...
<query...>
$DB->do("SET ENABLE_SEQSCAN TO ON") || die...
   Forces the index to be used. It isn't clean but it works for now and
I don't need to do anything outside my program.
   Lacking any other ideas, thank you very, very much for sticking with
this and helping me out!
Madison
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Madison Kelly (Digimer)
TLE-BU, The Linux Experience; Back Up
http://tle-bu.thelinuxexperience.com
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
			
		Tom Lane wrote:
> Madison Kelly <linux@alteeve.com> writes:
>
>>   So the index obiously provides a major performance boost! I just need
>>to figure out how to tell the planner how to use it...
>
>
> Simple division shows that the planner's cost estimate ratio between the
> seqscan and the indexscan (11956.84 vs 83171.78) is off by a factor of
> more than 8 compared to reality (2018.996 vs 1700.459).  Also the cost of
> the sort seems to be drastically underestimated.
>
> I suspect this may be a combination of random_page_cost being too high
> (since your test case, at least, is no doubt fully cached in RAM) and
> cpu_operator_cost being too low.  I'm wondering if text comparisons
> are really slow on your machine --- possibly due to strcoll being
> inefficient in the locale you are using, which you didn't say.  That
> would account for both the seqscan being slower than expected and the
> sort taking a long time.
>
> It'd be interesting to look at the actual runtimes of this seqscan vs
> one that is doing a simple integer comparison over the same number of
> rows (and, preferably, returning about the same number of rows as this).
>
>             regards, tom lane
   This is where I should mention that though 'n00b' might be a little
harsh, I am still somewhat of a beginner (only been using postgres or
programming at all for a little over a year).
   What is, and how do I check, 'strcoll'? Is there a way that I can
clear the psql cache to make the tests more accurate to real-world
situations? For what it's worth, the program is working (I am doing
stress-testing and optimizing now) and the data in this table is actual
data, not a construct.
   As I mentioned to Bruno in my reply to him, I am trying to keep as
many tweaks as I can inside my program. The reason for this is that this
is a backup program that I am trying to aim to more mainstream users or
where a techy would set it up and then it would be used by mainstream
users. At this point I want to avoid, as best I can, any changes from
default to the 'postgres.conf' file or other external files. Later
though, once I finish this testing phase, I plan to write a section of
external tweaking where I will test these changes out and note my
success for mre advanced users who feel more comfortable playing with
postgres (and web server, rsync, etc) configs.
   If there is any way that I can make changes like this similar from
inside my (perl) program I would prefer that. For example, I implemented
the 'enable_seqscan' via:
$DB->do("SET ENABLE_SEQSCAN TO OFF") || die...
...
$DB->do("SET ENABLE_SEQSCAN TO ON") || die...
   Thank you very kindly! You and Bruno are wonderfully helpful! (as are
the other's who have replied ^_^;)
Madison
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Madison Kelly (Digimer)
TLE-BU, The Linux Experience; Back Up
http://tle-bu.thelinuxexperience.com
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
			
		On Mon, 2005-06-13 at 17:30 -0400, Madison Kelly wrote:
>    As I mentioned to Bruno in my reply to him, I am trying to keep as
> many tweaks as I can inside my program. The reason for this is that this
> is a backup program that I am trying to aim to more mainstream users or
> where a techy would set it up and then it would be used by mainstream
> users. At this point I want to avoid, as best I can, any changes from
> default to the 'postgres.conf' file or other external files. Later
> though, once I finish this testing phase, I plan to write a section of
> external tweaking where I will test these changes out and note my
> success for mre advanced users who feel more comfortable playing with
> postgres (and web server, rsync, etc) configs.
>
>    If there is any way that I can make changes like this similar from
> inside my (perl) program I would prefer that. For example, I implemented
> the 'enable_seqscan' via:
>
> $DB->do("SET ENABLE_SEQSCAN TO OFF") || die...
> ...
> $DB->do("SET ENABLE_SEQSCAN TO ON") || die...
Your goal is admirable. However, many people tweak their postgresql.conf
files, and your program can't know whether or not this has happened. It
might be a good idea to have a var $do_db_optimization, which defaults
to on. Then, if your users have trouble or are advanced admins they can
turn it off. My personal opinion is that there are too many
architectures and configurations for you to accurately optimize inside
your program, and this gives you and your users an easy out.
if ($do_db_optimization == 1) {
  $DB->do("SET ENABLE_SEQSCAN TO OFF") || die...
} else {
  # do nothing -- postgresql will figure it out
}
--
Karim Nassar <karim.nassar@acm.org>
			
		Karim Nassar wrote:
> Your goal is admirable. However, many people tweak their postgresql.conf
> files, and your program can't know whether or not this has happened. It
> might be a good idea to have a var $do_db_optimization, which defaults
> to on. Then, if your users have trouble or are advanced admins they can
> turn it off. My personal opinion is that there are too many
> architectures and configurations for you to accurately optimize inside
> your program, and this gives you and your users an easy out.
>
> if ($do_db_optimization == 1) {
>   $DB->do("SET ENABLE_SEQSCAN TO OFF") || die...
> } else {
>   # do nothing -- postgresql will figure it out
> }
That is a wonderful idea and I already have the foundation in place to
easily implement this. Thanks!!
Madison
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Madison Kelly (Digimer)
TLE-BU, The Linux Experience; Back Up
http://tle-bu.thelinuxexperience.com
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-