Querying a time range across multiple partitions

Поиск
Список
Период
Сортировка
От Cal Heldenbrand
Тема Querying a time range across multiple partitions
Дата
Msg-id CAAcwKheH7rCjdCbvyj-zEN=GZvrj2zGV9tywC-Q72BCA=T+CNA@mail.gmail.com
обсуждение исходный текст
Ответы Re: Querying a time range across multiple partitions  (Alban Hertroys <haramrae@gmail.com>)
Re: Querying a time range across multiple partitions  (John R Pierce <pierce@hogranch.com>)
Re: Querying a time range across multiple partitions  (David G Johnston <david.g.johnston@gmail.com>)
Re: Querying a time range across multiple partitions  (Jeff Janes <jeff.janes@gmail.com>)
Список pgsql-general
Hi everyone,

I'm trying to run a select query from a span of child partitions, separated out in daily tables, in Postgres 9.1.5.  The parent looks like this:

# \d logins
                 Table "public.logins"
   Column    |            Type             | Modifiers
-------------+-----------------------------+-----------
 username    | character varying(50)       |
 event       | character varying(20)       |
 time        | timestamp without time zone |
 host        | character varying(18)       |
 hash        | character varying(32)       |
Triggers:
    logins_log_trigger BEFORE INSERT ON logins FOR EACH ROW EXECUTE PROCEDURE logins_insert_trigger()
Number of child tables: 1581 (Use \d+ to list them.)

And an example child table:

# \d logins_20140904
            Table "public.logins_20140904"
   Column    |            Type             | Modifiers
-------------+-----------------------------+-----------
 username    | character varying(50)       |
 event       | character varying(20)       |
 time        | timestamp without time zone |
 host        | character varying(18)       |
 hash        | character varying(32)       |
Indexes:
    "logins_20140904_event" hash (event)
    "logins_20140904_event_time" btree (event, "time")
    "logins_20140904_username" hash (username)
    "logins_20140904_username_time" btree (username, "time")
Check constraints:
    "logins_20140904_time_check" CHECK ("time" >= '2014-09-04 00:00:00'::timestamp without time zone AND "time" <= '2014-09-04 23:59:59.99'::timestamp without time zone)
Inherits: logins


I'm attempting to run a query that looks something like this:

explain analyze select time,event from logins
  where username='bob' and hash='1234' and time > current_date - interval '1 week';

 Result  (cost=0.00..765.11 rows=1582 width=14)
   ->  Append  (cost=0.00..765.11 rows=1582 width=14)
         ->  Seq Scan on logins  (cost=0.00..0.00 rows=1 width=66)

              Filter: (((username)::text = 'bob'::text) AND ((hash)::text = '1234'::text) AND ("time" > (('now'::text)::date - '7 days'::interval)))
         ->  Index Scan using logins_20100501_username_time on logins_20100501 logins  (cost=0.01..0.48 rows=1 width=14)

               ...

This shows that it's attempting to run the query against all of my 1500 child tables. 

If I change it to manually specify dates with an IN clause, it selects the appropriate tables:

explain analyze select time,event from logins
  where username='bob' and hash='1234' and time in ('2014-09-04', '2014-09-05', '2014-09-03');

 Result  (cost=0.00..2.41 rows=3 width=31) (actual time=0.060..0.060 rows=0 loops=1)
   ->  Append  (cost=0.00..2.41 rows=3 width=31) (actual time=0.060..0.060 rows=0 loops=1)
         ->  Seq Scan on logins  (cost=0.00..0.00 rows=1 width=66) (actual time=0.001..0.001 rows=0 loops=1)

               Filter: (((username)::text = 'bob'::text) AND ((hash)::text = '1234'::text) AND ("time" = ANY ('{"2014-09-04 00:00:00","2014-09-05 00:00:00","2014-09-03 00:00:00"}'::timestamp without time zone[])))
         ->  Bitmap Heap Scan on logins_20140903 logins  (cost=1.09..1.20 rows=1 width=14) (actual time=0.039..0.039 rows=0 loops=1)

               ...

I know I could construct a query manually in my client by walking back through the calendar dates, but for my own curiosity I'd like to find out a more elegant way to run this query.

So far, I've been messing around with generating a series, then collecting it back into an array, but nothing I've tried seems to work.

A few examples:

explain analyze select time,event from logins
  where username='bob' and hash=1234' and time in (
    generate_series(current_date - interval '3 days', current_date, interval '1 day')
  );
ERROR: argument of IN must not return a set

explain analyze select time,event from logins
  where username='bob' and hash='1234' and time in (
    select array_agg(series)
    from generate_series(current_date - interval '3 days', current_date, interval '1 day')
    as u(series)
  );
ERROR: operator does not exist: timestamp without time zone = timestamp without time zone[]

explain analyze select time,event from logins
  where username='bob' and hash=1234 and time in (
    select unnest(array_agg(date_trunc('day',series))) from
    generate_series(current_date - interval '3 days', current_date, interval '1 day') as u(series)
  )

-- No errors, but still queries all child tables.

Thank you!

--Cal

В списке pgsql-general по дате отправления:

Предыдущее
От: Eildert Groeneveld
Дата:
Сообщение: count on cascading deletes
Следующее
От: Kevin Grittner
Дата:
Сообщение: Re: count on cascading deletes