Обсуждение: Need Some Explanation of an EXPLAIN
I'm trying to update a field in a table that has about 5 millin records with a table that has about 3.5 million records. I've created indexes for th a joined columns, but PostgreSQL 8.1 doesn't seem to want to use them. This makes for a very slow update. Below are descriptions of the two tables followed by the query plan that my PostgreSQL wants to use. So I have two questions: 1.) Why won't it use the indexes? 2.) How can I make this update faster? TIA, Bill Thoen Table "public.id2" Column | Type | Modifiers -----------+--------------+----------- grower_id | integer | fmid | character(7) | fsa_id | character(9) | Indexes: "id2_fsa_is_key" UNIQUE, btree (fsa_id) Table "public.growers" Column | Type | Modifiers --------------+-----------------------+----------- grower_id | integer | fsa_id | character(9) | co_name | character varying(45) | . . . Indexes: "grower_fsa_id_key" btree (fsa_id) fsa=# EXPLAIN UPDATE growers SET grower_id = id2.grower_id FROM id2 WHERE growers.fsa_id = id2.fsa_id; QUERY PLAN -------------------------------------------------------------------------- Hash Join (cost=70375.50..1603795.30 rows=6802720 width=355) Hash Cond: ("outer".fsa_id = "inner".fsa_id) -> Seq Scan on growers (cost=0.00..672373.20 rows=6802720 width=351) -> Hash (cost=46249.20..46249.20 rows=1966920 width=44) -> Seq Scan on id2 (cost=0.00..46249.20 rows=1966920 width=44) (5 rows)
On Fri, Sep 26, 2008 at 12:13 PM, Bill Thoen <bthoen@gisnet.com> wrote: > I'm trying to update a field in a table that has about 5 millin records with > a table that has about 3.5 million records. I've created indexes for th a > joined columns, but PostgreSQL 8.1 doesn't seem to want to use them. This > makes for a very slow update. 8.3 is a bit smarter on some of these types of queries, updating to that may (or may not) help in this situation. > 1.) Why won't it use the indexes? The pgsql planner chooses whether to use indexes versus sequential scans based on whether one or the other is "cheaper" to use. > 2.) How can I make this update faster? Get a faster server? > fsa=# EXPLAIN UPDATE growers > SET grower_id = id2.grower_id > FROM id2 WHERE growers.fsa_id = id2.fsa_id; > QUERY PLAN > -------------------------------------------------------------------------- > Hash Join (cost=70375.50..1603795.30 rows=6802720 width=355) > Hash Cond: ("outer".fsa_id = "inner".fsa_id) > -> Seq Scan on growers (cost=0.00..672373.20 rows=6802720 width=351) > -> Hash (cost=46249.20..46249.20 rows=1966920 width=44) > -> Seq Scan on id2 (cost=0.00..46249.20 rows=1966920 width=44) 1: Post explain analyze (if you can wait long enough for the update to finish. 2: Is the table bloated from multiple updates? Have you been vacuuming between each update you've tested? Do you have autovacuum enabled and is it aggresive enough to keep up? 3: If you want to force pgsql to use an index for testing purposes, try running this before your update query: set enable_indexscan=off; and see if it's faster or slower. Pgsql indexes don't have visibility, and this contributes to the higher cost of index scans on pgsql than some other dbs. However, sequential scans make sense if you're updating the whole table in almost any database.
"Scott Marlowe" <scott.marlowe@gmail.com> writes: > On Fri, Sep 26, 2008 at 12:13 PM, Bill Thoen <bthoen@gisnet.com> wrote: >> 2.) How can I make this update faster? > Get a faster server? Increasing work_mem might help. If the hashtable size estimate is right, you'd need something like 100MB to be sure that the join gets done in only one batch. Also, if a lot of the rows needn't actually change, it'd be worthwhile to add a WHERE clause: UPDATE growers SET grower_id = id2.grower_id FROM id2 WHERE growers.fsa_id = id2.fsa_id AND growers.grower_id != id2.grower_id regards, tom lane