Обсуждение: date - range

Поиск
Список
Период
Сортировка

date - range

От
"H.J. Sanders"
Дата:
Anybody a solution for the next problem:

people can subscribe to a service for 1 or more days (upto a max. of 365).

So in the database is stored: first_date and last_date

To select which people are subscribed for a certain date (e.g. today) we use
a select like

select   ....... where first_date <= today and last_date >= today

Whatever index we create system always does a sequential scan (which I can
understand).

Has someone a smarter solution?

All suggestions will be welcomed.



Re: date - range

От
Michael Fuhr
Дата:
On Fri, Apr 01, 2005 at 12:05:44PM +0200, H.J. Sanders wrote:
>
> people can subscribe to a service for 1 or more days (upto a max. of 365).
>
> So in the database is stored: first_date and last_date
>
> To select which people are subscribed for a certain date (e.g. today) we use
> a select like
>
> select   ....... where first_date <= today and last_date >= today
>
> Whatever index we create system always does a sequential scan (which I can
> understand).

Could you show the table and index definitions and the EXPLAIN
ANALYZE output of two queries, one with enable_seqscan set to "on"
and one with it set to "off"?  The planner might think that a
sequential scan would be faster than an index scan, and EXPLAIN
ANALYZE should tell us if that guess is correct.

What version of PostgreSQL are you using?

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/

Re: date - range

От
Mischa
Дата:
Quoting "H.J. Sanders" <hjs@rmax.nl>:

>
> Anybody a solution for the next problem:
> people can subscribe to a service for 1 or more days (upto a max. of 365).
> So in the database is stored: first_date and last_date
> To select which people are subscribed for a certain date (e.g. today) we use
> a select like
>
> select   ....... where first_date <= today and last_date >= today
>
> Whatever index we create system always does a sequential scan (which I can
> understand). Has someone a smarter solution?

Yep, standard SQL problem. The answer is sort of a hand-rolled GIST index.

To save typing, I'm going to pretend all your dates are stored as integers.
In reality, you'll probably be writing views with lots of EXTRACT(EPOCH...)'s in
them, to achieve the same result.

Suppose you have table People(id, first_date, last_date, ...)
Each such range "fits" in some larger fixed range of 1,2,4, ... days
that starts and ends on a fixed (epoch) date multiple of 1,2,4,...
For example, if your range were days (1040..1080), then that fits in the
64-wide range (1024...1088]. You calculate the start and width of the range that
just fits, and store that in People, too. Now, you index on (start,width).

Now, when you want to query for a given "today", you have to try for
all possible widths in People. Fortunately, that's darn few!
The ranges up to a decade (!) will still mean only 16 different widths.
A max range of one year (<512 days) means only 9 widths.
You can do this with a tiny static table.

Then: the query:

SELECT  People.* FROM People
JOIN Widths
ON    People.start = today - today % Widths.width
AND   People.width = Widths.width

Though this may look gross, it makes an index work where no normal BTree index
would. I've used it for some really nasty data conversions of 100M-row tables.

Your first name wouldn't be "Harlan", would it? :-)
-- "Dreams come true, not free."


Re: date - range

От
Mischa
Дата:
Quoting Mischa <mischa.Sandberg@telus.net>:

[deleted]
> SELECT  People.* FROM People
> JOIN Widths
> ON    People.start = today - today % Widths.width
> AND   People.width = Widths.width

Yikes! I hit the SEND button one ohnosecend too fast.

(1) You still ALSO have to test:
... AND today between first_date and last_date

(2) On some SQL engines, it makes a different to how the engine can re-order the
nested loops, if you make the index (width,start) instead of (start,width).
Haven't tried on PG8 yet.
--
"Dreams come true, not free."


Re: date - range

От
Michael Fuhr
Дата:
On Fri, Apr 01, 2005 at 09:59:44PM -0800, Mischa wrote:
> >
> > select   ....... where first_date <= today and last_date >= today
> >
> > Whatever index we create system always does a sequential scan (which I can
> > understand). Has someone a smarter solution?
>
> Yep, standard SQL problem. The answer is sort of a hand-rolled GIST index.

That might not be necessary in this case.

CREATE TABLE foo (
    id          serial PRIMARY KEY,
    first_date  date NOT NULL,
    last_date   date NOT NULL,
    CONSTRAINT check_date CHECK (last_date >= first_date)
);

/* populate table */

CREATE INDEX foo_date_idx ON foo (first_date, last_date);
ANALYZE foo;

EXPLAIN SELECT * FROM foo
WHERE first_date <= current_date AND last_date >= current_date;
                                         QUERY PLAN
--------------------------------------------------------------------------------------------
 Index Scan using foo_date_idx on foo  (cost=0.01..15.55 rows=97 width=12)
   Index Cond: ((first_date <= ('now'::text)::date) AND (last_date >= ('now'::text)::date))
(2 rows)

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/

Re: date - range

От
Bruno Wolff III
Дата:
On Sat, Apr 02, 2005 at 00:01:31 -0700,
  Michael Fuhr <mike@fuhr.org> wrote:
> On Fri, Apr 01, 2005 at 09:59:44PM -0800, Mischa wrote:
> > >
> > > select   ....... where first_date <= today and last_date >= today
> > >
> > > Whatever index we create system always does a sequential scan (which I can
> > > understand). Has someone a smarter solution?
> >
> > Yep, standard SQL problem. The answer is sort of a hand-rolled GIST index.
>
> That might not be necessary in this case.

Even though you get an index scan, I don't think it is going to be
very fast as there are going to be a lot of entries with first_date
<= current_date. If the requests are almost always for the current date,
then switching the order of columns in the index will help, since there
will probably be few orders for future service, so that the current
date being <= the last_date will be a much better indicator of whether
they have service. If requests are made well into the past then this
approach will have the same problem as checking first_date first.
He will probably get faster lookups using rtree or gist indexes as
he really is checking for containment.

>
> CREATE TABLE foo (
>     id          serial PRIMARY KEY,
>     first_date  date NOT NULL,
>     last_date   date NOT NULL,
>     CONSTRAINT check_date CHECK (last_date >= first_date)
> );
>
> /* populate table */
>
> CREATE INDEX foo_date_idx ON foo (first_date, last_date);
> ANALYZE foo;
>
> EXPLAIN SELECT * FROM foo
> WHERE first_date <= current_date AND last_date >= current_date;
>                                          QUERY PLAN
> --------------------------------------------------------------------------------------------
>  Index Scan using foo_date_idx on foo  (cost=0.01..15.55 rows=97 width=12)
>    Index Cond: ((first_date <= ('now'::text)::date) AND (last_date >= ('now'::text)::date))
> (2 rows)