Обсуждение: why hash on the primary key?
I'm seeing a lot of plans in my database that look like this: portal=# explain select * from foo i, foo j where i.id = j.id; QUERY PLAN ------------------------------------------------------------------------- Hash Join (cost=769.87..2159.36 rows=13283 width=264) Hash Cond: (i.id = j.id) -> Seq Scan on foo i (cost=0.00..343.83 rows=13283 width=132) -> Hash (cost=343.83..343.83 rows=13283 width=132) -> Seq Scan on foo j (cost=0.00..343.83 rows=13283 width=132) It seems very strange for the planner to decide to build an in-memory hash table on a column that is already indexed (the primary key, no less!). But this is happening A LOT - I often see plans where a majority of the joins are executed this way (and they're not all self-joins either...). It seems like the planner is concluding that it's going to need most or all of the pages in the table anyway, and that building a hash table as it goes is quicker than reading the index pages in from disk. On a simple query like the above, setting enable_seqscan to off or random_page_cost to 1 generates the expected plan: QUERY PLAN ------------------------------------------------------------------------------------- Merge Join (cost=0.00..2534.24 rows=13283 width=264) Merge Cond: (i.id = j.id) -> Index Scan using foo_pkey on foo i (cost=0.00..1167.50 rows=13283 width=132) -> Index Scan using foo_pkey on foo j (cost=0.00..1167.50 rows=13283 width=132) (4 rows) Experimentation shows this is actually about 25% faster. But, this is kind of a blunt instrument, and my attempts to fiddle with various parameters have not been real succesful in generating better plans for more complicated examples. Any suggestions/explanations? ...Robert
> I'm seeing a lot of plans in my database that look like this: > It seems very strange for the planner to decide to build an in-memory > hash table on a column that is already indexed (the primary key, no > less!). But this is happening A LOT - I often see plans where a > majority of the joins are executed this way (and they're not all > self-joins either...). It seems like the planner is concluding that > it's going to need most or all of the pages in the table anyway, and > that building a hash table as it goes is quicker than reading the > index pages in from disk. On a simple query like the above, setting > enable_seqscan to off or random_page_cost to 1 generates the expected > plan: > Experimentation shows this is actually about 25% faster. But, this is > kind of a blunt instrument, and my attempts to fiddle with various > parameters have not been real succesful in generating better plans for > more complicated examples. > > Any suggestions/explanations? > > ...Robert Could you send the output of these two queries using "explain analyze" instead of plain explain?
"Robert Haas" <robertmhaas@gmail.com> writes: > It seems very strange for the planner to decide to build an in-memory > hash table on a column that is already indexed (the primary key, no > less!). What's strange about it? A probe into an in-memory hashtable is a lot cheaper than a probe into an index, so this type of plan makes plenty of sense if the hashtable will fit in RAM and there are going to be a lot of probes. (Where "a lot" means "enough to amortize the cost of building the hashtable", of course.) > Experimentation shows this is actually about 25% faster. Well, that just says your cost parameters need a bit of adjustment if you'd like the planner to get the crossover point exactly right. regards, tom lane
> Could you send the output of these two queries using "explain analyze" > instead of plain explain? portal=# explain analyze select * from foo i, foo j where i.id = j.id; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------ Hash Join (cost=747.87..2127.36 rows=13283 width=272) (actual time=68.434..205.536 rows=13283 loops=1) Hash Cond: (i.id = j.id) -> Seq Scan on foo i (cost=0.00..315.83 rows=13283 width=136) (actual time=0.024..23.349 rows=13283 loops=1) -> Hash (cost=315.83..315.83 rows=13283 width=136) (actual time=68.353..68.353 rows=13283 loops=1) -> Seq Scan on foo j (cost=0.00..315.83 rows=13283 width=136) (actual time=0.009..23.839 rows=13283 loops=1) Total runtime: 223.390 ms (6 rows) portal=# set enable_seqscan to false; SET portal=# explain analyze select * from foo i, foo j where i.id = j.id; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------ Merge Join (cost=0.00..2310.24 rows=13283 width=272) (actual time=0.272..149.317 rows=13283 loops=1) Merge Cond: (i.id = j.id) -> Index Scan using foo_pkey on foo i (cost=0.00..1055.50 rows=13283 width=136) (actual time=0.205..29.714 rows=13283 loops=1) -> Index Scan using foo_pkey on foo j (cost=0.00..1055.50 rows=13283 width=136) (actual time=0.025..37.534 rows=13283 loops=1) Total runtime: 166.364 ms (5 rows)
> What's strange about it? A probe into an in-memory hashtable is a lot > cheaper than a probe into an index, so this type of plan makes plenty > of sense if the hashtable will fit in RAM and there are going to be a > lot of probes. (Where "a lot" means "enough to amortize the cost of > building the hashtable", of course.) Hmm... it didn't occur to me that the index probe itself might be more expensive than a hash probe. Is that due to concurrency control, or are you talking about the need to possibly read index pages in from disk? >> Experimentation shows this is actually about 25% faster. > > Well, that just says your cost parameters need a bit of adjustment > if you'd like the planner to get the crossover point exactly right. Any sense of which ones might be worth fiddling with? ...Robert
"Robert Haas" <robertmhaas@gmail.com> writes: >> What's strange about it? A probe into an in-memory hashtable is a lot >> cheaper than a probe into an index, so this type of plan makes plenty >> of sense if the hashtable will fit in RAM and there are going to be a >> lot of probes. (Where "a lot" means "enough to amortize the cost of >> building the hashtable", of course.) > Hmm... it didn't occur to me that the index probe itself might be > more expensive than a hash probe. Is that due to concurrency control, > or are you talking about the need to possibly read index pages in from > disk? Both, plus the loss of sequentiality of access to the table itself. >>> Experimentation shows this is actually about 25% faster. >> >> Well, that just says your cost parameters need a bit of adjustment >> if you'd like the planner to get the crossover point exactly right. > Any sense of which ones might be worth fiddling with? random_page_cost, effective_cache_size, maybe the cpu_xxx parameters. regards, tom lane
>>> Well, that just says your cost parameters need a bit of adjustment >>> if you'd like the planner to get the crossover point exactly right. > >> Any sense of which ones might be worth fiddling with? > > random_page_cost, effective_cache_size, maybe the cpu_xxx parameters. I fiddled with this some more on a more complex query with more than 20 joins. It appears that the culprit is from_collapse_limit. If I raise from_collapse_limit, most of the hash joins turn into Nested Loops or Nested Loop Left Joins with Index Scans. The estimated cost is about 20% of the cost of the original plan, but the planning takes so much longer that the actual time is 60% larger. This is really the pits. I had hoped that the work Simon Riggs was doing on join removal would hit 8.4, but that doesn't seem likely at this point. The problem is that the application is a web app where the user can select which columns they want to see. Most of the time only ~6-8 columns of 40+ that are available are selected, and the joins needed to generate the unselected columns can be tossed (which would hopefully both avoid unnecessary join steps and also speed up planning the remaining joins). ...Robert