Обсуждение: Query optimization using order by and limit
First of all, thank you for taking the time to review my question. After attending the PostgresOpen conference in Chicago last week, I've been pouring over explain logs for hours on end and although my system is MUCH better, I still can't resolve a few issues. Luckily my data is pretty well structured so solving one issue will likely solve many more so I'll start with this one.
Version: PostgreSQL 9.1rc1, compiled by Visual C++ build 1500, 64-bit
OS: Windows 7 64-bit
ORM: SQLAlchemy
Postgres table structure: I have daily partitioned tables for each of 4 "core tables" (the tables with the majority of my application's data). Each daily table inherits from its parent. I do not explicitly define a REFERENCE between these tables because I cannot guarantee the order in which the events are inserted into the database, but where there are references, the referenced row should exist in the other's daily table. The reason I partitioned the data in this manner is to increase query speed and make it easy to archive old data. (I'm new to high-end Postgres performance so there's likely several fundamental flaws in my assumptions. I won't turn down any recommendation.)
An example of a daily partitioned table follows:
cb=# \d osmoduleloads_2011_09_14;
Table "public.osmoduleloads_2011_09_14"
Column | Type | Modifiers
-----------------------+-----------------------------+------------------------------------------------------------
guid | numeric(20,0) | not null
osprocess_guid | numeric(20,0) | not null
filepath_guid | numeric(20,0) | not null
firstloadtime | numeric(20,0) | not null
md5hash | bytea | not null
host_guid | numeric(20,0) | default NULL::numeric
process_create_time | numeric(20,0) | default NULL::numeric
process_filepath_guid | numeric(20,0) | default NULL::numeric
event_time | timestamp without time zone | default '2011-09-14 00:00:00'::timestamp without time zone
Indexes:
"osmoduleloads_2011_09_14_pkey" PRIMARY KEY, btree (guid)
"idx_osmoduleloads_2011_09_14_filepath_guid" btree (filepath_guid)
"idx_osmoduleloads_2011_09_14_firstload_time" btree (firstloadtime)
"idx_osmoduleloads_2011_09_14_host_guid" btree (host_guid)
"idx_osmoduleloads_2011_09_14_md5hash" btree (md5hash)
"idx_osmoduleloads_2011_09_14_osprocess_guid" btree (osprocess_guid)
Check constraints:
"osmoduleloads_2011_09_14_event_time_check" CHECK (event_time = '2011-09-14 00:00:00'::timestamp without time zone)
"osmoduleloads_2011_09_14_firstloadtime_check" CHECK (firstloadtime >= 129604464000000000::bigint::numeric AND firstloadtime < 129605328000000000::bigint::numeric)
Inherits: osmoduleloads
Objective: The firstloadtime check constraint ensures that the record is applicable to that daily table. (In case you were wondering, the large numerics correspond to the Windows 100-nanosecond since the Epoch.) I'm inserting millions of records into each daily table so "query slowness" is quite easy to spot. Given that there is so much data per daily table, I was hoping to use the order by and limit clauses to "stop out" a query once it sufficed the limit clause and not be forced to visit each daily table. However, I'm spending way too much time in the older tables than I'd like - which leads me to believe that I;m doing something wrong. For ease of viewing, my explain analyze can be found at http://explain.depesz.com/s/tot
I'm still very new to this so I'm not sure if explain.depesz.com saves the original query. It wasn't readily apparent that it did so here is the original query:
SELECT osm_1.*, storefiles_1.*, filepaths_1.*, filepaths_2.* FROM (SELECT * FROM osmoduleloads JOIN hosts ON hosts.guid = osmoduleloads.host_guid WHERE hosts.guid = '2007075705813916178' AND osmoduleloads.firstloadtime >= 129604320000000000 AND osmoduleloads.firstloadtime < 129610367990000000 AND hosts.enabled = true AND hosts.user_id = 111 ORDER BY osmoduleloads.firstloadtime DESC LIMIT 251) AS osm_1 LEFT OUTER JOIN storefiles AS storefiles_1 ON osm_1.md5hash = storefiles_1.md5hash LEFT OUTER JOIN filepaths AS filepaths_1 ON osm_1.process_filepath_guid = filepaths_1.guid AND osm_1.event_time = filepaths_1.event_time LEFT OUTER JOIN filepaths AS filepaths_2 ON osm_1.filepath_guid = filepaths_2.guid AND osm_1.event_time= filepaths_2.event_time ORDER BY osm_1.firstloadtime DESC;
Hopefully my assumptions about order by and limit are correct and this query can be optimized.
Again, appreciate any help you can lend. Thanks in advance.
Mike
On 09/21/2011 07:14 PM, Michael Viscuso wrote:
That weird casting can't be helping. I'm not sure if it's your problem here, but the constraint exclusion code is pretty picky about matching the thing you're looking for against the CHECK constraint, and this is a messy one. The bigint conversion in the middle there isn't doing anything useful for you anyway; you really should simplify this to just look like this:
firstloadtime >= 129604464000000000::numeric
What you should start with here is confirming whether or not a simpler query touches all of the partitions or just the ones you expect it to. A simpler one like this:
SELECT * FROM osmoduleloads WHERE osmoduleloads.firstloadtime >= 129604320000000000 AND osmoduleloads.firstloadtime < 129610367990000000;
Would be the place to begin. Once you've got that working, then you can build up more pieces, and see if one of them results in the query not excluding partitions anymore or not. I can't figure out if you're running into a basic error here, where constraint exclusion just isn't working at all, or if you are only having this problem because the query is too complicated. Figuring that out will narrow the potential solutions.
Check constraints:"osmoduleloads_2011_09_14_event_time_check" CHECK (event_time = '2011-09-14 00:00:00'::timestamp without time zone)"osmoduleloads_2011_09_14_firstloadtime_check" CHECK (firstloadtime >= 129604464000000000::bigint::numeric AND firstloadtime < 129605328000000000::bigint::numeric)Inherits: osmoduleloads
That weird casting can't be helping. I'm not sure if it's your problem here, but the constraint exclusion code is pretty picky about matching the thing you're looking for against the CHECK constraint, and this is a messy one. The bigint conversion in the middle there isn't doing anything useful for you anyway; you really should simplify this to just look like this:
firstloadtime >= 129604464000000000::numeric
SELECT osm_1.*, storefiles_1.*, filepaths_1.*, filepaths_2.* FROM (SELECT * FROM osmoduleloads JOIN hosts ON hosts.guid = osmoduleloads.host_guid WHERE hosts.guid = '2007075705813916178' AND osmoduleloads.firstloadtime >= 129604320000000000 AND osmoduleloads.firstloadtime < 129610367990000000 AND hosts.enabled = true AND hosts.user_id = 111 ORDER BY osmoduleloads.firstloadtime DESC LIMIT 251) AS osm_1 LEFT OUTER JOIN storefiles AS storefiles_1 ON osm_1.md5hash = storefiles_1.md5hash LEFT OUTER JOIN filepaths AS filepaths_1 ON osm_1.process_filepath_guid = filepaths_1.guid AND osm_1.event_time = filepaths_1.event_time LEFT OUTER JOIN filepaths AS filepaths_2 ON osm_1.filepath_guid = filepaths_2.guid AND osm_1.event_time= filepaths_2.event_time ORDER BY osm_1.firstloadtime DESC;
What you should start with here is confirming whether or not a simpler query touches all of the partitions or just the ones you expect it to. A simpler one like this:
SELECT * FROM osmoduleloads WHERE osmoduleloads.firstloadtime >= 129604320000000000 AND osmoduleloads.firstloadtime < 129610367990000000;
Would be the place to begin. Once you've got that working, then you can build up more pieces, and see if one of them results in the query not excluding partitions anymore or not. I can't figure out if you're running into a basic error here, where constraint exclusion just isn't working at all, or if you are only having this problem because the query is too complicated. Figuring that out will narrow the potential solutions.
-- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
Greg Smith <greg@2ndQuadrant.com> writes: > That weird casting can't be helping. I'm not sure if it's your problem > here, but the constraint exclusion code is pretty picky about matching > the thing you're looking for against the CHECK constraint, and this is a > messy one. The bigint conversion in the middle there isn't doing > anything useful for you anyway; you really should simplify this to just > look like this: > firstloadtime >= 129604464000000000::numeric I have a more aggressive suggestion: change all the numeric(20,0) fields to bigint. Unless the OP actually needs values wider than 64 bits, the choice to use numeric is a significant performance penalty for nothing. regards, tom lane
Thanks guys, First of all, I should have included my postgres.conf file with the original submission. Sorry about that. It is now attached. Based on a recommendation, I also should have shown the parent child relationship between osmoduleloads and its daily partitioned tables. to reduce clutter, It is at the end of this message. Taking this one step at a time and taking Greg's second suggestion first, issuing select * from osmoduleloads WHERE osmoduleloads.firstloadtime >= 129604320000000000 AND osmoduleloads.firstloadtime < 129610367990000000; appears to only query the appropriate daily tables (2011_09_13 through 2011_09_20 - http://explain.depesz.com/s/QCG). So it appears that constraint_exclusion is working properly. Putting a limit on the query like: select * from osmoduleloads WHERE osmoduleloads.firstloadtime >= 129604320000000000 AND osmoduleloads.firstloadtime < 129610367990000000 limit 251; has the result that I'd expect to see http://explain.depesz.com/s/O7fZ. Ordering by firstloadtime AND limiting like: select * from osmoduleloads WHERE osmoduleloads.firstloadtime >= 129604320000000000 AND osmoduleloads.firstloadtime < 129610367990000000 order by firstloadtime desc limit 251; also has the result that I'd expect to see http://explain.depesz.com/s/RDh. Adding the hosts join condition to the mix was still OK http://explain.depesz.com/s/2Ns. Adding the hosts.enabled condition was still OK http://explain.depesz.com/s/UYN. Adding the hosts.user_id = 111 started the descent but it appears to still be obeying the proper contraint_exclusion that I'd expect, just with a ton of rows returned from the most recent daily tables http://explain.depesz.com/s/4WE. Adding the final condition hosts_guid = '2007075705813916178' is what ultimately kills it http://explain.depesz.com/s/8zy. By adding the host_guid, it spends considerably more time in the older tables than without this condition and I'm not sure why. Thanks Greg for the recommendation to step through it like that - hopefully this helps get us closer to a resolution. Greg/Tom, you are correct, these columns should be modified to whatever is easiest for Postgres to recognize 64-bit unsigned integers. Would you still recommend bigint for unsigned integers? I likely read the wrong documentation that suggested bigint for signed 64-bit integers and numeric(20) for unsigned 64-bit integers. Thanks again for all your help! Perhaps 15 hours of pouring over explain logs will finally pan out! Mike cb=# \d+ osmoduleloads; Table "public.osmoduleloads" Column | Type | Modifiers | Storage | Description -----------------------+-----------------------------+-----------------------+----------+------------- guid | numeric(20,0) | not null | main | osprocess_guid | numeric(20,0) | not null | main | filepath_guid | numeric(20,0) | not null | main | firstloadtime | numeric(20,0) | not null | main | md5hash | bytea | not null | extended | host_guid | numeric(20,0) | default NULL::numeric | main | process_create_time | numeric(20,0) | default NULL::numeric | main | process_filepath_guid | numeric(20,0) | default NULL::numeric | main | event_time | timestamp without time zone | | plain | Indexes: "osmoduleloads_pkey" PRIMARY KEY, btree (guid) Child tables: osmoduleloads_2001_12_31, osmoduleloads_2010_10_11, osmoduleloads_2010_10_12, osmoduleloads_2010_10_13, osmoduleloads_2011_07_27, osmoduleloads_2011_08_04, osmoduleloads_2011_08_05, osmoduleloads_2011_08_06, osmoduleloads_2011_08_07, osmoduleloads_2011_08_08, osmoduleloads_2011_08_09, osmoduleloads_2011_08_10, osmoduleloads_2011_08_11, osmoduleloads_2011_08_12, osmoduleloads_2011_08_13, osmoduleloads_2011_08_14, osmoduleloads_2011_08_15, osmoduleloads_2011_08_16, osmoduleloads_2011_08_17, osmoduleloads_2011_08_18, osmoduleloads_2011_08_19, osmoduleloads_2011_08_20, osmoduleloads_2011_08_21, osmoduleloads_2011_08_22, osmoduleloads_2011_08_23, osmoduleloads_2011_08_24, osmoduleloads_2011_08_25, osmoduleloads_2011_08_26, osmoduleloads_2011_08_27, osmoduleloads_2011_08_28, osmoduleloads_2011_08_29, osmoduleloads_2011_08_30, osmoduleloads_2011_08_31, osmoduleloads_2011_09_01, osmoduleloads_2011_09_02, osmoduleloads_2011_09_03, osmoduleloads_2011_09_04, osmoduleloads_2011_09_05, osmoduleloads_2011_09_06, osmoduleloads_2011_09_07, osmoduleloads_2011_09_08, osmoduleloads_2011_09_09, osmoduleloads_2011_09_10, osmoduleloads_2011_09_11, osmoduleloads_2011_09_12, osmoduleloads_2011_09_13, osmoduleloads_2011_09_14, osmoduleloads_2011_09_15, osmoduleloads_2011_09_16, osmoduleloads_2011_09_17, osmoduleloads_2011_09_18, osmoduleloads_2011_09_19, osmoduleloads_2011_09_20, osmoduleloads_2011_12_01 Has OIDs: no On 9/21/2011 10:09 PM, Tom Lane wrote: > Greg Smith <greg@2ndQuadrant.com> writes: >> That weird casting can't be helping. I'm not sure if it's your problem >> here, but the constraint exclusion code is pretty picky about matching >> the thing you're looking for against the CHECK constraint, and this is a >> messy one. The bigint conversion in the middle there isn't doing >> anything useful for you anyway; you really should simplify this to just >> look like this: >> firstloadtime >= 129604464000000000::numeric > I have a more aggressive suggestion: change all the numeric(20,0) fields > to bigint. Unless the OP actually needs values wider than 64 bits, > the choice to use numeric is a significant performance penalty for > nothing. > > regards, tom lane >
Вложения
Michael Viscuso <michael.viscuso@getcarbonblack.com> writes: > Greg/Tom, you are correct, these columns should be modified to whatever > is easiest for Postgres to recognize 64-bit unsigned integers. Would > you still recommend bigint for unsigned integers? I likely read the > wrong documentation that suggested bigint for signed 64-bit integers and > numeric(20) for unsigned 64-bit integers. Unsigned? Oh, hm, that's a bit of a problem because we don't have any unsigned types. If you really need to go to 2^64 and not 2^63 then you're stuck with numeric ... but that last bit is costing ya a lot. regards, tom lane
On Wed, Sep 21, 2011 at 11:22:53PM -0400, Tom Lane wrote: > Michael Viscuso <michael.viscuso@getcarbonblack.com> writes: > > Greg/Tom, you are correct, these columns should be modified to whatever > > is easiest for Postgres to recognize 64-bit unsigned integers. Would > > you still recommend bigint for unsigned integers? I likely read the > > wrong documentation that suggested bigint for signed 64-bit integers and > > numeric(20) for unsigned 64-bit integers. > > Unsigned? Oh, hm, that's a bit of a problem because we don't have any > unsigned types. If you really need to go to 2^64 and not 2^63 then > you're stuck with numeric ... but that last bit is costing ya a lot. > > regards, tom lane > Hi Michael, If you have access to the application, you can map the unsigned 64-bits to the PostgreSQL signed 64-bit type with a simple subtraction. That will allow you to drop all the numeric use. Also if the guid is a 64-bit values stuffed into a numeric(20), you can do it there as well. I achieved a hefty performance boost by making those application level changes in a similar situation. Regards, Ken
Thanks Ken, I'm discussing with my coworker how to best make that change *as we speak*. Do you think this will also resolve the original issue I'm seeing where the query doesn't "limit out properly" and spends time in child tables that won't yield any results? I was hoping that by using the check constraints, I could query over a week or month's worth of partitioned tables and the combination of order by and limit would eliminate any time searching unnecessary tables but that doesn't appear to be true. (I'm still very new to high-end Postgres performance so I could be mistaken.) Regardless, in the meantime, I'll switch those columns to bigint instead of numeric and have an update as soon as possible. Thanks for your help! Mike On 9/22/2011 9:41 AM, ktm@rice.edu wrote: > On Wed, Sep 21, 2011 at 11:22:53PM -0400, Tom Lane wrote: >> Michael Viscuso <michael.viscuso@getcarbonblack.com> writes: >>> Greg/Tom, you are correct, these columns should be modified to whatever >>> is easiest for Postgres to recognize 64-bit unsigned integers. Would >>> you still recommend bigint for unsigned integers? I likely read the >>> wrong documentation that suggested bigint for signed 64-bit integers and >>> numeric(20) for unsigned 64-bit integers. >> Unsigned? Oh, hm, that's a bit of a problem because we don't have any >> unsigned types. If you really need to go to 2^64 and not 2^63 then >> you're stuck with numeric ... but that last bit is costing ya a lot. >> >> regards, tom lane >> > Hi Michael, > > If you have access to the application, you can map the unsigned 64-bits > to the PostgreSQL signed 64-bit type with a simple subtraction. That will > allow you to drop all the numeric use. Also if the guid is a 64-bit > values stuffed into a numeric(20), you can do it there as well. I achieved > a hefty performance boost by making those application level changes in a > similar situation. > > Regards, > Ken
* Michael Viscuso (michael.viscuso@getcarbonblack.com) wrote: > Adding the final condition hosts_guid = '2007075705813916178' is what > ultimately kills it http://explain.depesz.com/s/8zy. By adding the > host_guid, it spends considerably more time in the older tables than > without this condition and I'm not sure why. What I think is happening here is that PG is pushing down that filter (not typically a bad thing..), but with that condition, it's going to scan the index until it finds a match for that filter before returning back up only to have that result cut out due to the limit. Having it as numerics isn't helping here, but the bigger issue is having to check all those tuples for a match to the filter. Mike, the filter has to be applied before the order by/limit, since those clauses come after the filter has been applied (you wouldn't want a 'where x = 2 limit 10' to return early just because it found 10 records where x didn't equal 2). What would be great is if PG would realize that the CHECK constraints prevent earlier records from being in these earlier tables, so it shouldn't need to consider them at all once the records from the 'latest' table has been found and the limit reached (reverse all this for an 'ascending' query, of course), which we can do when there's no order by. I don't believe we have that kind of logic or that information available at this late stage- the CHECK constraints are used to eliminate the impossible-to-match tables, but that's it. One option, which isn't great of course, would be to implement your own 'nested loop' construct (something I typically despise..) in the application which just walks backwards from the latest and pulls whatever records it can from each day and then stops once it hits the limit. Thanks, Stephen
Вложения
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Stephen, I spent the better part of the day implementing an application layer nested loop and it seems to be working well. Of course it's a little slower than a Postgres only solution because it has to pass data back and forth for each daily table query until it reaches the limit, but at least I don't have "runaway" queries like I was seeing before. That should be a pretty good stopgap solution for the time being. I was really hoping there was a Postgres exclusive answer though! :) If there are any other suggestions, it's a simple flag in my application to query the other way again... Thanks for all your help - and I'm still looking to change those numerics to bigints, just haven't figured out the best way yet. Mike On 9/22/2011 10:53 AM, Stephen Frost wrote: > * Michael Viscuso (michael.viscuso@getcarbonblack.com) wrote: >> Adding the final condition hosts_guid = '2007075705813916178' is what >> ultimately kills it http://explain.depesz.com/s/8zy. By adding the >> host_guid, it spends considerably more time in the older tables than >> without this condition and I'm not sure why. > > What I think is happening here is that PG is pushing down that filter > (not typically a bad thing..), but with that condition, it's going to > scan the index until it finds a match for that filter before returning > back up only to have that result cut out due to the limit. Having it as > numerics isn't helping here, but the bigger issue is having to check all > those tuples for a match to the filter. > > Mike, the filter has to be applied before the order by/limit, since > those clauses come after the filter has been applied (you wouldn't want > a 'where x = 2 limit 10' to return early just because it found 10 > records where x didn't equal 2). > > What would be great is if PG would realize that the CHECK constraints > prevent earlier records from being in these earlier tables, so it > shouldn't need to consider them at all once the records from the > 'latest' table has been found and the limit reached (reverse all this > for an 'ascending' query, of course), which we can do when there's no > order by. I don't believe we have that kind of logic or that > information available at this late stage- the CHECK constraints are used > to eliminate the impossible-to-match tables, but that's it. > > One option, which isn't great of course, would be to implement your own > 'nested loop' construct (something I typically despise..) in the > application which just walks backwards from the latest and pulls > whatever records it can from each day and then stops once it hits the > limit. > > Thanks, > > Stephen -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.17 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJOe7zzAAoJEBKjVK2HR1IXYwAIAKQBnFOtCNljL1Hs1ZQW3e+I ele/kZCiHzgHLFpN7zawt1Y7qf+3ntd6u+mkatJsnqeC+HY1Qee4VTUqr+hIKhcc VIGuuYkzuojs6/PgF6MAERHP24lRFdLCQtMgTY8RshYODvc07VpqkLq1cXhsNJZw 6pNBTEpEmA0MzMrmk3x6C8lFbyXZAYUxNLwG5SEWecV+lkOjnA70oKnSxG6EXRgk fkj2l1ezVn23KoO8SSUp4xBFHHOY/PQP9JtV7b52Gm5PC7lOqFFrXFygNP0KkWho TzyjoYKttShEjmTMXoLt181+NB4rQEas8USasemRA1pUkx2NrfvcK46gYucOAsg= =8yQW -----END PGP SIGNATURE-----
Mike, * Michael Viscuso (michael.viscuso@getcarbonblack.com) wrote: > I spent the better part of the day implementing an application layer > nested loop and it seems to be working well. Of course it's a little > slower than a Postgres only solution because it has to pass data back > and forth for each daily table query until it reaches the limit, but at > least I don't have "runaway" queries like I was seeing before. That > should be a pretty good stopgap solution for the time being. Glad to hear that you were able to get something going which worked for you. > I was really hoping there was a Postgres exclusive answer though! :) If > there are any other suggestions, it's a simple flag in my application to > query the other way again... I continue to wonder if some combination of multi-column indexes might have made the task of finding the 'lowest' record from each of the tables fast enough that it wouldn't be an issue. > Thanks for all your help - and I'm still looking to change those > numerics to bigints, just haven't figured out the best way yet. Our timestamps are also implemented using 64bit integers and would allow you to use all the PG date/time functions and operators. Just a thought. Thanks, Stephen
Вложения
Stephen,
Yes, I couldn't agree more. The next two things I will be looking at very carefully are the timestamps and indexes. I will reply to this post if either dramatically helps.
Thanks again for all your help. My eyes were starting to bleed from staring at explain logs!
Mike
On Thu, Sep 22, 2011 at 7:14 PM, Stephen Frost <sfrost@snowman.net> wrote:
Mike,> I spent the better part of the day implementing an application layerGlad to hear that you were able to get something going which worked for
> nested loop and it seems to be working well. Of course it's a little
> slower than a Postgres only solution because it has to pass data back
> and forth for each daily table query until it reaches the limit, but at
> least I don't have "runaway" queries like I was seeing before. That
> should be a pretty good stopgap solution for the time being.
you.I continue to wonder if some combination of multi-column indexes might
> I was really hoping there was a Postgres exclusive answer though! :) If
> there are any other suggestions, it's a simple flag in my application to
> query the other way again...
have made the task of finding the 'lowest' record from each of the
tables fast enough that it wouldn't be an issue.Our timestamps are also implemented using 64bit integers and would allow
> Thanks for all your help - and I'm still looking to change those
> numerics to bigints, just haven't figured out the best way yet.
you to use all the PG date/time functions and operators. Just a
thought.
Thanks,
Stephen
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
iEYEARECAAYFAk57wXAACgkQrzgMPqB3kijaNwCfQ9cSdzzHyiPwa+BTzIihWR7T
baoAoIbL8P3atU1cfbcCoFXFGbKE7fPt
=ZRqu
-----END PGP SIGNATURE-----
Stephen Frost <sfrost@snowman.net> writes: > What I think is happening here is that PG is pushing down that filter > (not typically a bad thing..), but with that condition, it's going to > scan the index until it finds a match for that filter before returning > back up only to have that result cut out due to the limit. Yeah, it's spending quite a lot of time finding the first matching row in each child table. I'm curious why that is though; are the child tables not set up with nonoverlapping firstloadtime ranges? > What would be great is if PG would realize that the CHECK constraints > prevent earlier records from being in these earlier tables, The explain shows that that isn't the case, because it *is* finding at least one candidate row in each table. It's just running quite far into the firstloadtime sequence to do it. If you're stuck with this table arrangement, one thing that would help is a two-column index on (host_guid, firstloadtime) on each child table. That would match the search condition exactly, and so reduce the cost to find the first matching row to nearly nil. Whether this query's speed is important enough to justify maintaining such an index is a question I can't answer for you. regards, tom lane
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > Yeah, it's spending quite a lot of time finding the first matching row > in each child table. I'm curious why that is though; are the child > tables not set up with nonoverlapping firstloadtime ranges? They are set up w/ nonoverlapping firstloadtime ranges, using CHECK constraints such as: "osmoduleloads_2011_09_14_firstloadtime_check" CHECK (firstloadtime >= 129604464000000000::bigint::numeric AND firstloadtime < 129605328000000000::bigint::numeric) The issue here is that the query is saying "Give me the first 150 records with this host_id in this week-long range". PG happily eliminates all the tables that are outside of the week-long range during constraint exclusion. After that, however, it hunts down the earliest records (which matches 'host_id') from each child table. Sure, from each table there's a record in the week-long range with the host_id that matches. What PG doesn't realize is that it can stop after pulling the 150 records from the most recent table (and flipping the direction of the query or the tables doesn't help- PG still pulls a record from each table). > > What would be great is if PG would realize that the CHECK constraints > > prevent earlier records from being in these earlier tables, > > The explain shows that that isn't the case, because it *is* finding at > least one candidate row in each table. It's just running quite far into > the firstloadtime sequence to do it. My point above is that the CHECK constraints ensure an ordering which could be leveraged to use the latest table first and then stop if enough tuples are returned (or immediately go to the next table), without ever considering the other tables. I'm not looking for PG to eliminate those other tables for consideration in all cases- if the limit is large enough, it may get all the way down to them. I'm pretty sure this isn't something which PG does today and I don't expect teaching it to do this to be trivial, but it certainly would be nice as this strikes me as a very common use-case. > If you're stuck with this table arrangement, one thing that would help > is a two-column index on (host_guid, firstloadtime) on each child table. Agreed, I mentioned this to the OP previously and it's on his list of things to try. Thanks, Stephen
Вложения
Stephen Frost <sfrost@snowman.net> writes: > * Tom Lane (tgl@sss.pgh.pa.us) wrote: >> Yeah, it's spending quite a lot of time finding the first matching row >> in each child table. I'm curious why that is though; are the child >> tables not set up with nonoverlapping firstloadtime ranges? > The issue here is that the query is saying "Give me the first 150 > records with this host_id in this week-long range". Oh, I see. So the query range overlaps multiple child tables, even after constraint exclusion eliminates a lot of them. > My point above is that the CHECK constraints ensure an ordering which > could be leveraged to use the latest table first and then stop if enough > tuples are returned (or immediately go to the next table), without ever > considering the other tables. Yeah. My opinion is that trying to reverse-engineer that from the CHECK constraints would cost a lot more than it's worth. What we need, and will hopefully have sooner or later, is an abstract concept of "partitioned table" in which this kind of relationship is known a-priori instead of having to be laboriously re-deduced every time we plan a query. >> If you're stuck with this table arrangement, one thing that would help >> is a two-column index on (host_guid, firstloadtime) on each child table. > Agreed, I mentioned this to the OP previously and it's on his list of > things to try. AFAICS the fact that this example would be fast if we were only paying attention to the newest table is mere luck. If it can take a long time to find the first matching host_guid record in several of the child tables, why might it not take just as long to find said record in the other one? I think you really need the two-column indexes, if keeping this query's runtime to a minimum is critical. regards, tom lane