Обсуждение: Good/Bad query plans based on text criteria

От:
JmH
Дата:

Hello,

I am struggling to understand why for certain criteria that i supply for a query alters the the query plan. In my "good" case i can see that an index is used, in my bad case where i only change the text value of the criteria, but not the criteria itslef (ie change/add the conditions) a hbitmap heap scan of the table is performed.

Refer attached Good/Bad query plans.

The basic query is:

SELECT * FROM highrate_log_entry
WHERE
test_seq_number > 26668670
and udc = '2424'
AND (test_signal_number = 'D2030'
)
ORDER BY test_seq_number LIMIT 11

test_seq_number is the pk and is generated by a sequence.

The D2030 is the only thing that i vary between good/bad runs. The issue is possibly related to the  data spead is for the test-signal_number is not uniform, but there does not appear to be that much difference in difference between the first sequence number and the last sequence number (to achieve the 11 results), when compared between the  test_seq_number that yield good or bad results.

I dont believe that the issue is to do with re-writing the query, but how the planner chooses its path.

I am using Postgres 8.4 on windows with default postgres.conf. I have tried changing(increasing) shared_buffers, work_mem and effective_cache_size without success.

Any suggestions would be appreciated.

Thanks

Jason
От:
Tom Lane
Дата:

JmH <> writes:
> I am struggling to understand why for certain criteria that i supply for a
> query alters the the query plan. In my "good" case i can see that an index
> is used, in my bad case where i only change the text value of the criteria,
> but not the criteria itslef (ie change/add the conditions) a hbitmap heap
> scan of the table is performed.

I think you're jumping to conclusions.  The second plan is processing
about 100 times as many rows, because the WHERE conditions are much less
selective.  A change in plan is entirely appropriate.

It might be that you need to change planner parameters (particularly
random_page_cost/seq_page_cost) to more nearly approximate the operating
conditions of your database, but I'd recommend being very cautious about
doing so on the basis of a small number of example queries.  In
particular it's easy to fall into the trap of optimizing for
fully-cached scenarios because repeatedly trying the same example
results in touching only already-cached data --- but that might or might
not be reflective of your whole workload.

            regards, tom lane