Re: [GENERAL] Cost: Big Tables vs. Organized Separation of Data
От | Bruce Momjian |
---|---|
Тема | Re: [GENERAL] Cost: Big Tables vs. Organized Separation of Data |
Дата | |
Msg-id | 199902032127.QAA18226@candle.pha.pa.us обсуждение исходный текст |
Список | pgsql-hackers |
> Bruce Momjian wrote: > > > > > > So, I guess my question is: how costly are joins? I've heard that > > > > Postgres pretty much "pukes" (in terms of speed) when you're trying > > > > to do anything more than 6 table joins in one query. This leads > > > > me to believe that joins are fairly costly... ???? > > > > > > i've noticed a pretty drastic slowdown going from a 4 table join > > > (instantaneous) to a 5 table join (15-20 seconds) i don't think i've > > > tried 6 tables yet with the same database. > > i'll attach the output of my query explains if you want to look at them > in detail, but here's a quick overview: all of my tables that are being > joined have 125000-200000 records, all have a primary key, and the join > is done by matching the primary key in the respective tables. i'm doing > selects on 4,5,6,7,8 tables testing both with and without GEQO. 7 and 8 > tables caused a crash (apparently out of memory), 4-6 were pretty > reasonable. they all came up with sensible plans, namely: > Nested Loop > -> Nested Loop > -> Nested Loop > -> Index Scan > -> Index Scan > -> Index Scan > -> Index Scan > with the respective levels of index scans/nested loops for each number > of tables in the join. > > although i did find some improvement by lowering the GEQO threshold to > 6, it wasn't really the answer i was hoping for. using GEQO was the > loser (time wise) each time on similar selects joining 4 and 5 tables. > GEQO came up with a better plan (lower cost), but took longer to come up > with the plan, making it the loser unless you can prepare the statement > beforehand (which would be nice and i know has been discussed already). > apparently 4 is the magic number for "instantaneous" joins in my case, 5 > gets you up to about 5 seconds (if you type the query right), and 6 is > just nasty without GEQO (it's still churning whereas with GEQO it took > ~20 seconds) So 6 was your magic number for GEQO in 6.4, though I should comment that in 6.5, we will use tables+indexes as the geqo start value, so with 5 tables and one index on each you had combined value of 10. > > so now the question is, what is the relationship between cost and how > long a query takes? for example, one of my queries show up with > cost=14387.49 size=165779 width=664(6 table join w/o GEQO); another > shows up as cost=14383.39 size=165779 width=644 (5 table join w/o GEQO) > yet the first one takes several minutes whereas the second only takes a > few seconds. i don't really know what other questions i should be > asking here. my guess is that there's some kind of memory limit that's > being hit causing the jump from 4->5 to take more time than it > apparently should causing it to take multiple (and therefore slower) > steps to get the end result. is this a possibility? or am i way off > base? [CC to hackers.] The cost really is just for comparison to other plans. Not sure you can really make any meaning out of the number, and geqo probably uses a different measurement for cost. The 4-5 setting is about what I expected, and I realize GEQO is still too slow for large joins. I am looking at what can be done with this, in trying to make non-geqo faster for joins in the 4-10 range, so perhaps we can speed those up, and use geqo only for really large joins. I hope to have something for 6.5, but am still researching. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
В списке pgsql-hackers по дате отправления: