Query across a date range
От | David Jaquay |
---|---|
Тема | Query across a date range |
Дата | |
Msg-id | ad4aa5a805010711171a3c1696@mail.gmail.com обсуждение исходный текст |
Ответы |
Re: Query across a date range
Re: Query across a date range |
Список | pgsql-performance |
Summary: Doing a two or three table join for a date range performs worse than doing the same query individually for each date in the range. What works: Doing a query just on a single date or a date range (against just one table) runs quick; 'explain' says it uses an index scan. Doing a query on a single date for one store or for one market uses all index scans, and runs quick as well. The problem: Doing a query for a date range on a particular store or market, though, for a date range of more than a few days does a sequential scan of sales_tickets, and performs worse than doing one single date query for each date. My 'explain' for one such query is below. Background: I have two or three tables involved in a query. One table is holds stores (7 rows at present), one holds sales tickets (about 5 million) and one holds line items (about 10 million). It's test data that I've generated and loaded using '\copy from'. Each has a primary key, and line items have two dates, written and delivered, that are indexed individually. Store has a market id; a market contains multiple stores (in my case, 2 or 3). Each sales ticket has 1-3 line items. Is there a way to tell postgres to use an index scan on sales_tickets? Curiously, in response to recent postings in the "Low Performance for big hospital server" thread, when I flatten the tables by putting storeid into line_items, it runs somewhat faster in all cases, and much faster in some; (I have times, if anyone is interested). Thanks, Dave mydb=> explain select * from line_items t, sales_tickets s where writtenDate >= '12/01/2002' and writtenDate <= '12/31/2002' and t.ticketId = s.ticketId and s.storeId = 1; QUERY PLAN --------------------------------------------------------------------------------------------------- Hash Join (cost=93865.46..114054.74 rows=19898 width=28) Hash Cond: ("outer".ticketId = "inner".ticketId) -> Index Scan using line_items_written on line_items t (cost=0.00..3823.11 rows=158757 width=16) Index Cond: ((writtendate >= '2002-12-01'::date) AND (writtendate <= '2002-12-31'::date)) -> Hash (cost=89543.50..89543.50 rows=626783 width=12) -> Seq Scan on sales_tickets s (cost=0.00..89543.50 rows=626783 width=12) Filter: (storeid = 1) (7 rows) mydb=> explain select * from line_items t, sales_tickets s where writtenDate = '12/01/2002' and t.ticketId = s.ticketId and s.storeid = 1; QUERY PLAN --------------------------------------------------------------------------------------------- Nested Loop (cost=0.00..16942.25 rows=697 width=28) -> Index Scan using line_items_written on soldtrx t (cost=0.00..121.97 rows=5554 width=16) Index Cond: (writtendate = '2002-12-01'::date) -> Index Scan using sales_tickets_pkey on sales_tickets s (cost=0.00..3.02 rows=1 width=12) Index Cond: ("outer".ticketId = s.ticketId) Filter: (storeid = 1) (6 rows) The tables: create table stores -- 7 rows ( storeId integer not null, marketId integer not null ); create table sales_tickets -- 500,000 rows ( ticketId integer primary key, storeId integer not null, custId integer not null ); create table line_items -- 1,000,000 rows ( lineItemId integer primary key, ticketId integer references sales_tickets, writtenDate date not null, deliveredDate date not null ); create index line_items_written on line_items (writtenDate); create index line_items_delivered on line_items (deliveredDate);
В списке pgsql-performance по дате отправления: