Re: Unexpected sequential scan on an indexed column

Поиск
Список
Период
Сортировка
От Eddy Escardo-Raffo
Тема Re: Unexpected sequential scan on an indexed column
Дата
Msg-id 4eaa4a5e0911151559y1cc43ee8md2d7fe14dfa6f2b1@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Unexpected sequential scan on an indexed column  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Unexpected sequential scan on an indexed column
Список pgsql-performance
Thanks, Tom. I had discarded the possibility of data type mismatch already, which was your first guess, but was wondering if the lopsided distribution of location values would lead the planner to make a decision that is good on average but bad for this particular query, as you point out in your second guess.
 
I'll try populating the test users with a more evenly distributed location field, which will be more realistic anyway, and see if that works out better.
 
BTW, the -1 is not really a dummy value, but it's just a value that we have been using in tests for "fake test location ID". I just started performance measurement for my application and so far had measured performance with every user being in the same default location and things seemed to be going well, so I tried to switch a couple users to a different location and see what happened, and that made performance drop significantly.
(even more detail: my queries also limit results to 10 approx, so DB quickly found 10 rows that match location -1, but it took a while to discover there weren't more than 2 rows with the other value).
 
Thanks!
Eddy

On Sun, Nov 15, 2009 at 3:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Eddy Escardo-Raffo <eescardo@kikini.com> writes:
> The table used in this query is called "users", and it has columns "userid"
> (primary key) and "location".
> The "location" column is indexed.
> The users table has 1 million rows, and all rows have integer typed value
> '-1' for  "location" column, except for 2 rows that have the integer value
> '76543'.

Oh, after poking at it a bit more, I realize the problem: the planner
doesn't want to use an indexscan because it assumes there's a
significant probability that the search will be for -1 (in which case
the indexscan would be slower than a seqscan, as indeed your results
prove).  Even though it could know in this particular case that the
comparison value isn't -1, I doubt that teaching it that would help your
real queries where it will probably be impossible to determine the
comparison values in advance.

I would suggest considering using NULL rather than inventing a dummy
value for unknown locations.  The estimation heuristics will play a
lot nicer with that choice.

                       regards, tom lane

В списке pgsql-performance по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Unexpected sequential scan on an indexed column
Следующее
От: Eddy Escardo-Raffo
Дата:
Сообщение: Re: Unexpected sequential scan on an indexed column