Re: Question about difference in performance of 2 queries

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Question about difference in performance of 2 queries
Дата
Msg-id 16345.1072726751@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: Question about difference in performance of 2 queries  (Sean Shanny <shannyconsulting@earthlink.net>)
Ответы Re: Question about difference in performance of 2 queries  (Sean Shanny <shannyconsulting@earthlink.net>)
Re: Question about difference in performance of 2 queries  (Nicholas Shanny <nshanny@tripadvisor.com>)
Список pgsql-performance
Sean Shanny <shannyconsulting@earthlink.net> writes:
> Here is the pg_stats data.  The explain analyze queries are still running.

> select * from pg_stats where tablename = 'f_pageviews' and attname =
> 'content_key';
>  schemaname |  tablename  |   attname   | null_frac | avg_width |
> n_distinct | most_common_vals |   most_common_freqs
> |
> histogram_bounds                                   | correlation
>
------------+-------------+-------------+-----------+-----------+------------+------------------+-----------------------+-------------------------------------------------------------------------------------+-------------
>  public     | f_pageviews | content_key |         0 |         4 |
> 983 | {-1,1528483}     | {0.749333,0.00166667} |

Oh-ho, I see the problem: about 75% of your table has content_key = -1.

Why is that a problem, you ask?  Well, the planner realizes that
"content_key > -1" is a pretty good restriction condition (better than
the date condition, apparently) and so it tries to use that as the index
scan condition.  The problem is that in 7.4 and before, the btree index
code implements a "> -1" scan starting boundary by finding the first -1
and then advancing to the first key that's not -1.  So you end up
scanning through 75% of the index before anything useful happens :-(

I just fixed this poor behavior in CVS tip a couple weeks ago:
http://archives.postgresql.org/pgsql-committers/2003-12/msg00220.php
but the patch seems too large and unproven to risk back-patching into
7.4.*.

If you expect that a pretty large fraction of your data will always have
dummy content_key, it'd probably be worth changing the index to not
index -1's at all --- that is, make it a partial index with the
condition "WHERE content_key > -1".  Another workaround is to leave the
index as-is but phrase the query WHERE condition as "content_key >= 0"
instead of "> -1".

            regards, tom lane

В списке pgsql-performance по дате отправления:

Предыдущее
От: Sean Shanny
Дата:
Сообщение: Re: Question about difference in performance of 2 queries
Следующее
От: Sean Shanny
Дата:
Сообщение: Re: Question about difference in performance of 2 queries