Обсуждение: date - range
Anybody a solution for the next problem: people can subscribe to a service for 1 or more days (upto a max. of 365). So in the database is stored: first_date and last_date To select which people are subscribed for a certain date (e.g. today) we use a select like select ....... where first_date <= today and last_date >= today Whatever index we create system always does a sequential scan (which I can understand). Has someone a smarter solution? All suggestions will be welcomed.
On Fri, Apr 01, 2005 at 12:05:44PM +0200, H.J. Sanders wrote: > > people can subscribe to a service for 1 or more days (upto a max. of 365). > > So in the database is stored: first_date and last_date > > To select which people are subscribed for a certain date (e.g. today) we use > a select like > > select ....... where first_date <= today and last_date >= today > > Whatever index we create system always does a sequential scan (which I can > understand). Could you show the table and index definitions and the EXPLAIN ANALYZE output of two queries, one with enable_seqscan set to "on" and one with it set to "off"? The planner might think that a sequential scan would be faster than an index scan, and EXPLAIN ANALYZE should tell us if that guess is correct. What version of PostgreSQL are you using? -- Michael Fuhr http://www.fuhr.org/~mfuhr/
Quoting "H.J. Sanders" <hjs@rmax.nl>: > > Anybody a solution for the next problem: > people can subscribe to a service for 1 or more days (upto a max. of 365). > So in the database is stored: first_date and last_date > To select which people are subscribed for a certain date (e.g. today) we use > a select like > > select ....... where first_date <= today and last_date >= today > > Whatever index we create system always does a sequential scan (which I can > understand). Has someone a smarter solution? Yep, standard SQL problem. The answer is sort of a hand-rolled GIST index. To save typing, I'm going to pretend all your dates are stored as integers. In reality, you'll probably be writing views with lots of EXTRACT(EPOCH...)'s in them, to achieve the same result. Suppose you have table People(id, first_date, last_date, ...) Each such range "fits" in some larger fixed range of 1,2,4, ... days that starts and ends on a fixed (epoch) date multiple of 1,2,4,... For example, if your range were days (1040..1080), then that fits in the 64-wide range (1024...1088]. You calculate the start and width of the range that just fits, and store that in People, too. Now, you index on (start,width). Now, when you want to query for a given "today", you have to try for all possible widths in People. Fortunately, that's darn few! The ranges up to a decade (!) will still mean only 16 different widths. A max range of one year (<512 days) means only 9 widths. You can do this with a tiny static table. Then: the query: SELECT People.* FROM People JOIN Widths ON People.start = today - today % Widths.width AND People.width = Widths.width Though this may look gross, it makes an index work where no normal BTree index would. I've used it for some really nasty data conversions of 100M-row tables. Your first name wouldn't be "Harlan", would it? :-) -- "Dreams come true, not free."
Quoting Mischa <mischa.Sandberg@telus.net>: [deleted] > SELECT People.* FROM People > JOIN Widths > ON People.start = today - today % Widths.width > AND People.width = Widths.width Yikes! I hit the SEND button one ohnosecend too fast. (1) You still ALSO have to test: ... AND today between first_date and last_date (2) On some SQL engines, it makes a different to how the engine can re-order the nested loops, if you make the index (width,start) instead of (start,width). Haven't tried on PG8 yet. -- "Dreams come true, not free."
On Fri, Apr 01, 2005 at 09:59:44PM -0800, Mischa wrote: > > > > select ....... where first_date <= today and last_date >= today > > > > Whatever index we create system always does a sequential scan (which I can > > understand). Has someone a smarter solution? > > Yep, standard SQL problem. The answer is sort of a hand-rolled GIST index. That might not be necessary in this case. CREATE TABLE foo ( id serial PRIMARY KEY, first_date date NOT NULL, last_date date NOT NULL, CONSTRAINT check_date CHECK (last_date >= first_date) ); /* populate table */ CREATE INDEX foo_date_idx ON foo (first_date, last_date); ANALYZE foo; EXPLAIN SELECT * FROM foo WHERE first_date <= current_date AND last_date >= current_date; QUERY PLAN -------------------------------------------------------------------------------------------- Index Scan using foo_date_idx on foo (cost=0.01..15.55 rows=97 width=12) Index Cond: ((first_date <= ('now'::text)::date) AND (last_date >= ('now'::text)::date)) (2 rows) -- Michael Fuhr http://www.fuhr.org/~mfuhr/
On Sat, Apr 02, 2005 at 00:01:31 -0700, Michael Fuhr <mike@fuhr.org> wrote: > On Fri, Apr 01, 2005 at 09:59:44PM -0800, Mischa wrote: > > > > > > select ....... where first_date <= today and last_date >= today > > > > > > Whatever index we create system always does a sequential scan (which I can > > > understand). Has someone a smarter solution? > > > > Yep, standard SQL problem. The answer is sort of a hand-rolled GIST index. > > That might not be necessary in this case. Even though you get an index scan, I don't think it is going to be very fast as there are going to be a lot of entries with first_date <= current_date. If the requests are almost always for the current date, then switching the order of columns in the index will help, since there will probably be few orders for future service, so that the current date being <= the last_date will be a much better indicator of whether they have service. If requests are made well into the past then this approach will have the same problem as checking first_date first. He will probably get faster lookups using rtree or gist indexes as he really is checking for containment. > > CREATE TABLE foo ( > id serial PRIMARY KEY, > first_date date NOT NULL, > last_date date NOT NULL, > CONSTRAINT check_date CHECK (last_date >= first_date) > ); > > /* populate table */ > > CREATE INDEX foo_date_idx ON foo (first_date, last_date); > ANALYZE foo; > > EXPLAIN SELECT * FROM foo > WHERE first_date <= current_date AND last_date >= current_date; > QUERY PLAN > -------------------------------------------------------------------------------------------- > Index Scan using foo_date_idx on foo (cost=0.01..15.55 rows=97 width=12) > Index Cond: ((first_date <= ('now'::text)::date) AND (last_date >= ('now'::text)::date)) > (2 rows)