Обсуждение: Querying a time range across multiple partitions

Поиск
Список
Период
Сортировка

Querying a time range across multiple partitions

От
Cal Heldenbrand
Дата:
Hi everyone,

I'm trying to run a select query from a span of child partitions, separated out in daily tables, in Postgres 9.1.5.  The parent looks like this:

# \d logins
                 Table "public.logins"
   Column    |            Type             | Modifiers
-------------+-----------------------------+-----------
 username    | character varying(50)       |
 event       | character varying(20)       |
 time        | timestamp without time zone |
 host        | character varying(18)       |
 hash        | character varying(32)       |
Triggers:
    logins_log_trigger BEFORE INSERT ON logins FOR EACH ROW EXECUTE PROCEDURE logins_insert_trigger()
Number of child tables: 1581 (Use \d+ to list them.)

And an example child table:

# \d logins_20140904
            Table "public.logins_20140904"
   Column    |            Type             | Modifiers
-------------+-----------------------------+-----------
 username    | character varying(50)       |
 event       | character varying(20)       |
 time        | timestamp without time zone |
 host        | character varying(18)       |
 hash        | character varying(32)       |
Indexes:
    "logins_20140904_event" hash (event)
    "logins_20140904_event_time" btree (event, "time")
    "logins_20140904_username" hash (username)
    "logins_20140904_username_time" btree (username, "time")
Check constraints:
    "logins_20140904_time_check" CHECK ("time" >= '2014-09-04 00:00:00'::timestamp without time zone AND "time" <= '2014-09-04 23:59:59.99'::timestamp without time zone)
Inherits: logins


I'm attempting to run a query that looks something like this:

explain analyze select time,event from logins
  where username='bob' and hash='1234' and time > current_date - interval '1 week';

 Result  (cost=0.00..765.11 rows=1582 width=14)
   ->  Append  (cost=0.00..765.11 rows=1582 width=14)
         ->  Seq Scan on logins  (cost=0.00..0.00 rows=1 width=66)

              Filter: (((username)::text = 'bob'::text) AND ((hash)::text = '1234'::text) AND ("time" > (('now'::text)::date - '7 days'::interval)))
         ->  Index Scan using logins_20100501_username_time on logins_20100501 logins  (cost=0.01..0.48 rows=1 width=14)

               ...

This shows that it's attempting to run the query against all of my 1500 child tables. 

If I change it to manually specify dates with an IN clause, it selects the appropriate tables:

explain analyze select time,event from logins
  where username='bob' and hash='1234' and time in ('2014-09-04', '2014-09-05', '2014-09-03');

 Result  (cost=0.00..2.41 rows=3 width=31) (actual time=0.060..0.060 rows=0 loops=1)
   ->  Append  (cost=0.00..2.41 rows=3 width=31) (actual time=0.060..0.060 rows=0 loops=1)
         ->  Seq Scan on logins  (cost=0.00..0.00 rows=1 width=66) (actual time=0.001..0.001 rows=0 loops=1)

               Filter: (((username)::text = 'bob'::text) AND ((hash)::text = '1234'::text) AND ("time" = ANY ('{"2014-09-04 00:00:00","2014-09-05 00:00:00","2014-09-03 00:00:00"}'::timestamp without time zone[])))
         ->  Bitmap Heap Scan on logins_20140903 logins  (cost=1.09..1.20 rows=1 width=14) (actual time=0.039..0.039 rows=0 loops=1)

               ...

I know I could construct a query manually in my client by walking back through the calendar dates, but for my own curiosity I'd like to find out a more elegant way to run this query.

So far, I've been messing around with generating a series, then collecting it back into an array, but nothing I've tried seems to work.

A few examples:

explain analyze select time,event from logins
  where username='bob' and hash=1234' and time in (
    generate_series(current_date - interval '3 days', current_date, interval '1 day')
  );
ERROR: argument of IN must not return a set

explain analyze select time,event from logins
  where username='bob' and hash='1234' and time in (
    select array_agg(series)
    from generate_series(current_date - interval '3 days', current_date, interval '1 day')
    as u(series)
  );
ERROR: operator does not exist: timestamp without time zone = timestamp without time zone[]

explain analyze select time,event from logins
  where username='bob' and hash=1234 and time in (
    select unnest(array_agg(date_trunc('day',series))) from
    generate_series(current_date - interval '3 days', current_date, interval '1 day') as u(series)
  )

-- No errors, but still queries all child tables.

Thank you!

--Cal

Re: Querying a time range across multiple partitions

От
Alban Hertroys
Дата:
On 05 Sep 2014, at 19:31, Cal Heldenbrand <cal@fbsdata.com> wrote:

> I'm attempting to run a query that looks something like this:
>
> explain analyze select time,event from logins
>   where username='bob' and hash='1234' and time > current_date - interval '1 week';
>
>  Result  (cost=0.00..765.11 rows=1582 width=14)
>    ->  Append  (cost=0.00..765.11 rows=1582 width=14)
>          ->  Seq Scan on logins  (cost=0.00..0.00 rows=1 width=66)
>               Filter: (((username)::text = 'bob'::text) AND ((hash)::text = '1234'::text) AND ("time" >
(('now'::text)::date- '7 days'::interval))) 
>          ->  Index Scan using logins_20100501_username_time on logins_20100501 logins  (cost=0.01..0.48 rows=1
width=14)
>                ...
>
> This shows that it's attempting to run the query against all of my 1500 child tables.

What about:
explain analyze select time,event from logins
  where username='bob' and hash='1234' and time > (current_date - interval '1 week’)::timestamp without time zone;

Also, you don’t appear to be having an index that starts from “time”, so none of the indexes will be particularly
efficientat finding a specific time range. It’s quite possible that that makes PG think that “time” is not a very good
candidateto filter on, simply because the optimizer doesn’t look that far. 

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.



Re: Querying a time range across multiple partitions

От
John R Pierce
Дата:
On 9/5/2014 10:31 AM, Cal Heldenbrand wrote:
Number of child tables: 1581

that's an insane number of children.    We try and limit it to 50 or so child tables, for instance, 6 months retention by week, of data will millions of rows/day.



-- 
john r pierce                                      37N 122W
somewhere on the middle of the left coast

Re: Querying a time range across multiple partitions

От
Cal Heldenbrand
Дата:
What about:
explain analyze select time,event from logins
  where username='bob' and hash='1234' and time > (current_date - interval '1 week’)::timestamp without time zone;

Also, you don’t appear to be having an index that starts from “time”, so none of the indexes will be particularly efficient at finding a specific time range. It’s quite possible that that makes PG think that “time” is not a very good candidate to filter on, simply because the optimizer doesn’t look that far.

That didn't seem to work either.  The thought did occur to me that the query planner wasn't using my combined column indexes.  I tried adding just a btree index on time and it still did the same problem.

Your example query there also goes back to 2010 for scanning tables.

Interestingly enough, this query actually works:

# explain analyze select time,event from logins
   where username='bob' and hash='1234' and time > '2014-08-29';

 Result  (cost=0.00..8.21 rows=8 width=20) (actual time=0.074..0.074 rows=0 loops=1)
   ->  Append  (cost=0.00..8.21 rows=8 width=20) (actual time=0.074..0.074 rows=0 loops=1)
         ->  Seq Scan on logins  (cost=0.00..0.00 rows=1 width=66) (actual time=0.002..0.002 rows=0 loops=1)
               Filter: (("time" > '2014-08-29 00:00:00'::timestamp without time zone) AND ((username)::text = 'bob'::text) AND ((hash)::text = '1234'::text))
         ->  Index Scan using logins_20140829_username on logins_20140829 logins  (cost=0.00..1.21 rows=1 width=14) (actual time=0.021..0.021 rows=0 loops=1)
               Index Cond: ((username)::text = 'bob'::text)


So it seems to me that the query parser isn't preprocessing "current_date - interval", but a statically defined time span *does* work.

Doesn't that seem weird?

Re: Querying a time range across multiple partitions

От
Cal Heldenbrand
Дата:
This particular use case is for user behavior data mining.  The hardware is beefy, and has tablespaces split out onto SSD/spindle for new & old data.  All of my queries are pretty much a nightly cron process, and I don't really care too much about the speed.  Scanning the full 4 years of data takes about 30 seconds per query anyway... but I thought it'd be nice to speed it up when the difference is milliseconds vs 30 seconds.


On Fri, Sep 5, 2014 at 4:17 PM, John R Pierce <pierce@hogranch.com> wrote:
On 9/5/2014 10:31 AM, Cal Heldenbrand wrote:
Number of child tables: 1581

that's an insane number of children.    We try and limit it to 50 or so child tables, for instance, 6 months retention by week, of data will millions of rows/day.



-- 
john r pierce                                      37N 122W
somewhere on the middle of the left coast

Re: Querying a time range across multiple partitions

От
David G Johnston
Дата:
Cal Heldenbrand wrote
> explain analyze select time,event from logins
>   where username='bob' and hash='1234' and time in (
>     select array_agg(series)
>     from generate_series(current_date - interval '3 days', current_date,
> interval '1 day')
>     as u(series)
>   );
> ERROR: operator does not exist: timestamp without time zone = timestamp
> without time zone[]
>
> explain analyze select time,event from logins

Probably doesn't help but the above would have to be written:

WHERE time = ANY(
SELECT array_agg(...) FROM ...
)

I don't know whether the planner is smart enough to optimize on "= ANY()" -
whether statically generated or determined at runtime.

I am pretty sure that constraint exclusion must be done during the planning
stages and that you have to pass in a literal set of values (or an array)
that you generate in a previous query.

David J.




--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Querying-a-time-range-across-multiple-partitions-tp5817958p5817989.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Re: Querying a time range across multiple partitions

От
Jeff Janes
Дата:
On Fri, Sep 5, 2014 at 10:31 AM, Cal Heldenbrand <cal@fbsdata.com> wrote:

explain analyze select time,event from logins
  where username='bob' and hash='1234' and time > current_date - interval '1 week';

 Result  (cost=0.00..765.11 rows=1582 width=14)
   ->  Append  (cost=0.00..765.11 rows=1582 width=14)
         ->  Seq Scan on logins  (cost=0.00..0.00 rows=1 width=66)

              Filter: (((username)::text = 'bob'::text) AND ((hash)::text = '1234'::text) AND ("time" > (('now'::text)::date - '7 days'::interval)))
         ->  Index Scan using logins_20100501_username_time on logins_20100501 logins  (cost=0.01..0.48 rows=1 width=14)

               ...

This shows that it's attempting to run the query against all of my 1500 child tables. 

I believe the problem is that the planner (which does the partition pruning) is not willing to materialize the value of current_date, so it can't use a specific value to prune partitions.  After all, the date might change between the planner and the executor, if you leave the plan open for a long time, or make the call very close to midnight.

You will probably have to select the current_date (or get it from your system, or cron, or whoever triggers the script), and then hardcode it into the query.

Cheers,

Jeff

Re: Querying a time range across multiple partitions

От
Jeff Janes
Дата:

On Fri, Sep 5, 2014 at 2:17 PM, John R Pierce <pierce@hogranch.com> wrote:
On 9/5/2014 10:31 AM, Cal Heldenbrand wrote:
Number of child tables: 1581

that's an insane number of children.    We try and limit it to 50 or so child tables, for instance, 6 months retention by week, of data will millions of rows/day.

I've used more than that many for testing purposes, and there was little problem.  The main thing is that your insert trigger (if you have one on the master table) needs to be structured as a binary search-like nesting of if..elsif, not a linear-searching like structure.  Unless of course almost all inserts go into the newest partition, then it might make more sense to do the linear search with that being the first test.  But for performance, better to just insert directly into the correct child table.

I dont' know if that many tables is a good idea in this case, but it is not prima facie insane.

Cheers,

Jeff

Re: Querying a time range across multiple partitions

От
John R Pierce
Дата:
On 9/7/2014 12:55 PM, Jeff Janes wrote:
On Fri, Sep 5, 2014 at 2:17 PM, John R Pierce <pierce@hogranch.com> wrote:
On 9/5/2014 10:31 AM, Cal Heldenbrand wrote:
Number of child tables: 1581

that's an insane number of children.    We try and limit it to 50 or so child tables, for instance, 6 months retention by week, of data will millions of rows/day.

I've used more than that many for testing purposes, and there was little problem.  The main thing is that your insert trigger (if you have one on the master table) needs to be structured as a binary search-like nesting of if..elsif, not a linear-searching like structure.  Unless of course almost all inserts go into the newest partition, then it might make more sense to do the linear search with that being the first test.  But for performance, better to just insert directly into the correct child table.

any select that can't be preplanned to a specific child will need to check all 1500 children.     this is far less efficient than checking, say, 50 and letting the b-tree index of each child reject or narrow down to the specific row(s).  The one is roughly 1500*log(N/1500) while the other is 50*log(N/50) at least to a first order approximation.





-- 
john r pierce                                      37N 122W
somewhere on the middle of the left coast

Re: Querying a time range across multiple partitions

От
Cal Heldenbrand
Дата:
Thanks Jeff!  That's what I wanted to confirm, that I need to hard code / pregenerate my dates in the query.  I was mainly curious why it didn't work with current_date, and that answers it.

And BTW, all my inserts happen on the most recent table, so my insert trigger with a linear search is ordered date descending.  It seems to work fine since my use case is to insert data once, and never change it again.  Of course my only problem is when a select query confuses the planner, and searches my entire set.  ;-) 

Thanks,

--Cal

On Sun, Sep 7, 2014 at 2:44 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Fri, Sep 5, 2014 at 10:31 AM, Cal Heldenbrand <cal@fbsdata.com> wrote:

explain analyze select time,event from logins
  where username='bob' and hash='1234' and time > current_date - interval '1 week';

 Result  (cost=0.00..765.11 rows=1582 width=14)
   ->  Append  (cost=0.00..765.11 rows=1582 width=14)
         ->  Seq Scan on logins  (cost=0.00..0.00 rows=1 width=66)

              Filter: (((username)::text = 'bob'::text) AND ((hash)::text = '1234'::text) AND ("time" > (('now'::text)::date - '7 days'::interval)))
         ->  Index Scan using logins_20100501_username_time on logins_20100501 logins  (cost=0.01..0.48 rows=1 width=14)

               ...

This shows that it's attempting to run the query against all of my 1500 child tables. 

I believe the problem is that the planner (which does the partition pruning) is not willing to materialize the value of current_date, so it can't use a specific value to prune partitions.  After all, the date might change between the planner and the executor, if you leave the plan open for a long time, or make the call very close to midnight.

You will probably have to select the current_date (or get it from your system, or cron, or whoever triggers the script), and then hardcode it into the query.

Cheers,

Jeff

Re: Querying a time range across multiple partitions

От
Andreas Brandl
Дата:
John,

> On Fri, Sep 5, 2014 at 2:17 PM, John R Pierce < pierce@hogranch.com >
> wrote:
>> On 9/5/2014 10:31 AM, Cal Heldenbrand wrote:
>> Number of child tables: 1581
>> that's an insane number of children. We try and limit it to 50 or so
>> child tables, for instance, 6 months retention by week, of data will
>> millions of rows/day.
>>
>> I've used more than that many for testing purposes, and there was
>> little problem. The main thing is that your insert trigger (if you
>> have one on the master table) needs to be structured as a binary
>> search-like nesting of if..elsif, not a linear-searching like
>> structure. Unless of course almost all inserts go into the newest
>> partition, then it might make more sense to do the linear search
>> with that being the first test. But for performance, better to just
>> insert directly into the correct child table.
> any select that can't be preplanned to a specific child will need to
> check all 1500 children. this is far less efficient than checking,
> say, 50 and letting the b-tree index of each child reject or narrow
> down to the specific row(s). The one is roughly 1500*log(N/1500)
> while the other is 50*log(N/50) at least to a first order
> approximation.

can you explain that further? In the end, that argument sounds like it would always be more efficient to use a single
tableand its index instead, rather than partitioning it (log(N) < c*log(N/c) for any c > 1, if I'm not totally lost
today).

So, is there any insights of how many partitions are still useful?

I have tables with roughly 1000 partitions and did not have any issues so far. Even with having INSERT rules that are
linedup worst-case (from past to current, while data is always being inserted for the current date), I haven't seen any
considerabledegradation of INSERT performance so far. 

Thanks,
Andreas


Re: Querying a time range across multiple partitions

От
John R Pierce
Дата:
On 9/8/2014 1:40 PM, Andreas Brandl wrote:
> can you explain that further? In the end, that argument sounds like it would always be more efficient to use a single
tableand its index instead, rather than partitioning it (log(N) < c*log(N/c) for any c > 1, if I'm not totally lost
today).

it indeed would.

good reasons for partitioning include...

  * efficient date based bulk deletion (we have a very large table that
has 6 months retention, so we partition by week and delete the oldest
week when a new week starts... dropping a partition is far faster than
deleting 20 million records by date)

  * needing to put data across several tablespaces - I haven't had to do
this.

  * more efficient vacuuming - really really large tables, like 100 GB,
take a LONG time to vacuum.   sane sized partitions will vacuum in less
time, and since older time-based partitions aren't typically updated,
they can be frozen.



--
john r pierce                                      37N 122W
somewhere on the middle of the left coast



Re: Querying a time range across multiple partitions

От
Jeff Janes
Дата:
On Mon, Sep 8, 2014 at 1:40 PM, Andreas Brandl <ml@3.141592654.de> wrote:
John,

> On Fri, Sep 5, 2014 at 2:17 PM, John R Pierce < pierce@hogranch.com >
> wrote:
>> On 9/5/2014 10:31 AM, Cal Heldenbrand wrote:
>> Number of child tables: 1581
>> that's an insane number of children. We try and limit it to 50 or so
>> child tables, for instance, 6 months retention by week, of data will
>> millions of rows/day.
>>
>> I've used more than that many for testing purposes, and there was
>> little problem. The main thing is that your insert trigger (if you
>> have one on the master table) needs to be structured as a binary
>> search-like nesting of if..elsif, not a linear-searching like
>> structure. Unless of course almost all inserts go into the newest
>> partition, then it might make more sense to do the linear search
>> with that being the first test. But for performance, better to just
>> insert directly into the correct child table.
> any select that can't be preplanned to a specific child will need to
> check all 1500 children. this is far less efficient than checking,
> say, 50 and letting the b-tree index of each child reject or narrow
> down to the specific row(s). The one is roughly 1500*log(N/1500)
> while the other is 50*log(N/50) at least to a first order
> approximation.

can you explain that further? In the end, that argument sounds like it would always be more efficient to use a single table and its index instead, rather than partitioning it (log(N) < c*log(N/c) for any c > 1, if I'm not totally lost today).

Right.  Partitioning is NOT a generic way to improve query performance.  With rare exceptions, partitioning makes query performance worse, and then partition pruning may or may not serve to recover most of that lost performance.

Partitioning allows you conspire with PostgreSQL to micromanage the the layout of the data in ways that improve manageability and maintainability.  For example, you pay a little overhead each time you run a query and (perhaps) each time you insert a row, but in exchange for that you can "delete" a year of data with a nearly-instantaneous command.
 
Cheers,

Jeff