Re: Performance query about large tables, lots of concurrent access

Поиск
Список
Период
Сортировка
От Karl Wright
Тема Re: Performance query about large tables, lots of concurrent access
Дата
Msg-id 4677ECA9.3000402@metacarta.com
обсуждение исходный текст
Ответ на Re: Performance query about large tables, lots of concurrent access  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Performance query about large tables, lots of concurrent access
Re: Performance query about large tables, lots of concurrent access
Список pgsql-performance
Tom Lane wrote:
> Karl Wright <kwright@metacarta.com> writes:
>> Also, as I said before, I have done extensive query analysis and found
>> that the plans for the queries that are taking a long time are in fact
>> very reasonable.  Here's an example from the application log of a query
>> that took way more time than its plan would seem to indicate it should:
>
>> [2007-06-18 09:39:49,797]ERROR  Plan: Index Scan using i1181764142395 on
>> intrinsiclink  (cost=0.00..14177.29 rows=5 width=253)
>> [2007-06-18 09:39:49,797]ERROR  Plan:   Index Cond: ((jobid = $2) AND
>> ((childidhash)::text = ($3)::text))
>> [2007-06-18 09:39:49,797]ERROR  Plan:   Filter: ((childid = ($4)::text)
>> AND ((isnew = ($5)::bpchar) OR (isnew = ($6)::bpchar)))
>
> I see the discussion thread has moved on to consider lack-of-vacuuming
> as the main problem, but I didn't want to let this pass without
> comment.  The above plan is not necessarily good at all --- it depends
> on how many rows are selected by the index condition alone (ie, jobid
> and childidhash) versus how many are selected by the index and filter
> conditions.  If the index retrieves many rows, most of which are
> eliminated by the filter condition, it's still gonna take a long time.
>
> In this case it looks like the planner is afraid that that's exactly
> what will happen --- a cost of 14177 suggests that several thousand row
> fetches are expected to happen, and yet it's only predicting 5 rows out
> after the filter.  It's using this plan anyway because it has no better
> alternative, but you should think about whether a different index
> definition would help.
>
>             regards, tom lane
>

Well, that's odd, because the hash in question that it is using is the
SHA-1 hash of a URL.  There's essentially one row per URL in this table.
  Even with a large table I would not expect more than a couple of
collisions at most.

How does it arrive at that estimate of 14,000?

Karl


В списке pgsql-performance по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Performance query about large tables, lots of concurrent access
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Performance query about large tables, lots of concurrent access