Re: [GENERAL] Special index for "like"-based query

Поиск
Список
Период
Сортировка
От Francisco Olarte
Тема Re: [GENERAL] Special index for "like"-based query
Дата
Msg-id CA+bJJbw3KsZAPSWTyEL2N4Eq80p-D76O6i5bun2xcEYhPvM1Eg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [GENERAL] Special index for "like"-based query  (Alban Hertroys <haramrae@gmail.com>)
Список pgsql-general
Job:

On Fri, Dec 30, 2016 at 1:01 PM, Alban Hertroys <haramrae@gmail.com> wrote:
>> On 30 Dec 2016, at 11:42, Job <Job@colliniconsulting.it> wrote:
...
>> The index applied on the timestamp field is a btree("timestamp")
...
>> select domain, sum(accessi) as c_count from TABLE where action='1' AND profile IN ('PROFILE_CODE') AND
timestamp::date BETWEEN '2016-12-27' AND '2016-12-30' AND timestamp::time BETWEEN '00:00:00' AND '23:59:59' GROUP BY
domainORDER BY c_count DESC  LIMIT 101 
...
>> Last question: the table is partitioned. I need to manually create index for every sub-tables or there is a way to
createon every sub-tables once? 

I think Alban missed this ( or I missed his response ). Yes, you need
to create the indexes for the partitions. When in doubt, create a
partition and \d+ it.

IIRC you can do 'create partition LIKE master INCLUDING indexes
INHERITS(master)', but you'll have to test. Anyway, this is normally
not too useful as the master table is normally indexless and kept
empty. I normally script the partition creation, and I woill recommend
doing that too.


Now onto the BETWEEN PROBLEM:

> It's usually more efficient to cast the constants you're comparing to, than to cast a field value for each record in
theset. The exception to that is when you have an index on the casted field. 
> In your case, since you're casting to date and time separately, and whole days even, it's probably more efficient to
combinethat into: 
> … AND timestamp BETWEEN '2016-12-27 00:00:00'::timestamp with time zone AND '2016-12-30 23:59:59'::timestamp with
timezone ... 

Alban is correct here, in both counts.

- DO NOT CAST THE COLUMNS, cast the constants ( unless you are
building a specialized index, you can build an index in cast(timestamp
as date), and it would be useful if you did a lot of queries ONLY ON
DATES ).

- DO NOT USE  CLOSED INTERVALS for real number-like columns ( remember
I told you timestamps are a point in the time line, so real-like ).
The man problems strives from the fact tht you cannot cover the real
line with non-overlapping CLOSED intervals, BETWEEN uses closed
intervals and subtle problems permeate from this fact. Math is a harsh
mistress.

( Even when working with integer-like numbers half-open intervals are
normally the best way to go in the not so short term, but between
seems so nice and natural and reads so well that even I use it where I
should not )

> But even then, you're excluding items that fall in the second between the end date and the next day. The new range
typesare useful there, for example: 
>
> … AND timestamp <@ '[2016-12-27 00:00:00, 2016-12-31 00:00:00)'::tsrange

And this is how it is solved with those new-fangled interval thingies
( I've been keying (ts>=xxx and ts <yyy), parens included, for so long
that I never remember those, but they are nearly the same. Probably
the optimizer splits it anyway.

> The above isn't entirely correct, as tsrange uses timestamp without time zone, but you get the gist.

Or use >=, < those work.

> However, if those time ranges can have other values than '[00:00. 23:59]', then you probably need 2 indexes on that
timestampcolumn; one cast to date and one to time. Otherwise, you end up creating timestamp range filters for each day
inthe range in the query (which could still be the better approach). 

Even if they have other values, single index on timestamp column is
the way to go if you only select single intervals. I mean, Xmas
morning ( data between 25 and 25 and time between 8:00 and 12:59 can
easiliy be selected by the interval [20161225T080000,
20161225T130000), but all the mornings in december can not ( although
a query with ts>='20160101' and ts <'20170101' and ts:time >='08:00'
and ts:time<'13:00' should work quite well, the first two condition
guide to an index scan and the rest is done with a filtering ).

Francisco Olarte.


В списке pgsql-general по дате отправления:

Предыдущее
От: Job
Дата:
Сообщение: R: [GENERAL] Special index for "like"-based query
Следующее
От: Michael Sheaver
Дата:
Сообщение: Re: [GENERAL] Performance PLV8 vs PLPGSQL