Re: [GENERAL] Problems with inequalities on numeric fields in 6.5

Поиск
Список
Период
Сортировка
От Martin Weinberg
Тема Re: [GENERAL] Problems with inequalities on numeric fields in 6.5
Дата
Msg-id 199907081544.LAA01597@osprey.phast.umass.edu
обсуждение исходный текст
Ответ на Re: [GENERAL] Problems with inequalities on numeric fields in 6.5  (Bruce Momjian <maillist@candle.pha.pa.us>)
Ответы Re: [GENERAL] Problems with inequalities on numeric fields in 6.5  (Bruce Momjian <maillist@candle.pha.pa.us>)
Список pgsql-general
Bruce Momjian wrote on Wed, 07 Jul 1999 12:29:13 EDT
>> Thanks, Bruce!
>>
>> Yes, I tried the latter query and it's the same:
>>
>> --------------------------------------------------
>>
>> lmc=> explain select count(*) from lmctot where j_m>3.4::float4 and j_m<3.5:
>:float4;
>> NOTICE:  QUERY PLAN:
>>
>> Aggregate  (cost=62349.97 rows=788100 width=4)
>>   ->  Index Scan using j on lmctot  (cost=62349.97 rows=788100 width=4)
>>
>> EXPLAIN
>>
>> --------------------------------------------------
>> I've tried all permutations of the conversions in the ranges with
>> similar results (and vacuum analyzed several times as well as
>> dumped and reloaded and reloaded from scracth).  We have
>> a larger database with 20M rows which has a similar behavior.
>>
>> There are 7092894 rows in database "lmc".  So:
>>
>> (3.5-3.4)/(99.999-2.731) = 7292.1 != 788k
>>
>> A clue?
>
>I have just fixed a problem with index size estimates.  Try adding
>#include <math.h> to the top of backend/optimizer/util/plancat.c.  That
>may fix the estimated number of tuples returned.  However, it don't
>think you are going to get better performance, since you are already
>using the index in the above case.  The only big win I can think of is
>to use CLUSTER on that field.  That should speed things up quite a bit.
>

Hi Bruce,

Ok.  Sorry about the delay.

I added the math.h but that doesn't seem to change the
query plan output.

I then dropped all the indices, made a new one on three of the
variables and clustered:

create index m_col on lmctot using btree (j_m, h_m, k_m);
cluster m_col on lmctot;
vacuum analyze;

where the j_m, h_m, k_m are three float4 fields.

The cluster took about 18 hours on my 7.1 million records
(this is a dual 450Mhz Xeon Linux box).  Not sure why
this was so slow.

Anyway, this *hugely* improved queries of form:

select count(*) from lmctot where j_m>3.4::float4 and j_m<3.5:

although the explain query plan output is identical.  However
using h_m or k_m (not the first variable in the index) appears
to be doing a sequential scan.  Is that right?

I then made indices on h_m and k_m, vacuum analyzed and tried
again, but got identical performance.  If this is the way
it is, so be it, but I have the feeling that something is
not working properly.

Any ideas?

Again, with _heaps_ of thanks,

--Martin


===========================================================================

Martin Weinberg                      Phone: (413) 545-3821
Dept. of Physics and Astronomy       FAX:   (413) 545-2117/0648
530 Graduate Research Tower
University of Massachusetts
Amherst, MA  01003-4525


В списке pgsql-general по дате отправления:

Предыдущее
От: Marino Lionello
Дата:
Сообщение: marino.lionello@bluewin.ch
Следующее
От: "Moises Rincon D'Hoyos"
Дата:
Сообщение: ...