Обсуждение: Long Running Update
Hello I am attempting to run an update statement that copies two fields from one table to another: UPDATE table_A SET ( field_1 , field_2 ) = ( table_B.field_1 , table_B.field_2 ) FROM table_B WHERE table_B.id = table_A.id ; Table "table_B" contains almost 75 million records, with IDs that match those in "table_A". Both "field_1" and "field_2" are DOUBLE PRECISION. The ID fields are SERIAL primary-key integers in both tables. I tested (the logic of) this statement with a very small sample, and it worked correctly. The database runs on a dedicated Debian server in our office. I called both VACUUM and ANALYZE on the databased before invoking this statement. The statement has been running for 18+ hours so far. TOP, FREE and VMSTAT utilities indicate that only about half of the 6GB of memory is being used, so I have no reason to believe that the server is struggling. My question is: can I reasonably expect a statement like this to complete with such a large data-set, even if it takes several days? We do not mind waiting, but obviously we do not want to wait unnecessarily. Many thanks. Harry Mantheakis London, UK
On Thu, Jun 23, 2011 at 5:05 PM, Harry Mantheakis <harry.mantheakis@riskcontrollimited.com> wrote: > TOP, FREE and VMSTAT utilities indicate that only about half of the 6GB of > memory is being used, so I have no reason to believe that the server is > struggling. You have a hinky idea of server load. Mind you, there are lots of ways in which it could be struggling, other than memory usage. Like IO, CPU, lock contention... In my experience, such huge updates struggle a lot with fsync and random I/O when updating the indices. It will be a lot faster if you can drop all indices (including the PK), if you can.
Harry Mantheakis <harry.mantheakis@riskcontrollimited.com> wrote: > UPDATE > table_A > SET > ( > field_1 > , field_2 > ) = ( > table_B.field_1 > , table_B.field_2 > ) > FROM > table_B > WHERE > table_B.id = table_A.id > ; I would have just done: SET field_1 = table_B.field_1, field_2 = table_B.field_2 instead of using row value constructors. That might be slowing things down a bit. > I tested (the logic of) this statement with a very small sample, > and it worked correctly. Always a good sign. :-) > The statement has been running for 18+ hours so far. > My question is: can I reasonably expect a statement like this to > complete with such a large data-set, even if it takes several > days? If it's not leaking memory, I expect that it will complete. To get some sense of what it's doing, you could log on to another connection and EXPLAIN the statement. (NOTE: Be careful *not* to use EXPLAIN ANALYZE.) Another thing to consider if you run something like this again is that an UPDATE is an awful lot like an INSERT combined with a DELETE. The way PostgreSQL MVCC works, the old version of each row must remain until the updating transaction completes. If you were to divide this update into a series of updates by key range, the new versions of the rows from later updates could re-use the space previously occupied by the old version of rows from earlier updates. For similar reasons, you might want to add something like this to your WHERE clause, to prevent unnecessary updates: AND (table_B.field_1 IS DISTINCT FROM table_A.field_1 OR table_B.field_2 IS DISTINCT FROM table_A.field_2); -Kevin
Thank you Kevin. > SET field_1 = table_B.field_1, field_2 = table_B.field_2 I will try that, if I have to next time. > add something like this toyour WHERE clause, > to prevent unnecessary updates: > > AND (table_B.field_1 IS DISTINCT FROM table_A.field_1 > OR table_B.field_2 IS DISTINCT FROM table_A.field_2); Thank you for that explanation - I will keep that in mind in future. (In this case, the two fields that are being updated are all known to be empty - hence, distinct - in the target table.) > EXPLAIN the statement Here is the EXPLAIN result: ---------------------------------------------------------------------- QUERY PLAN ---------------------------------------------------------------------- Hash Join (cost=2589312.08..16596998.47 rows=74558048 width=63) Hash Cond: (table_A.id = table_B.id) -> Seq Scan on table_A(cost=0.00..1941825.05 rows=95612705 width=47) -> Hash (cost=1220472.48..1220472.48 rows=74558048 width=20) -> Seq Scan on table_B(cost=0.00..1220472.48 rows=74558048 width=20) ---------------------------------------------------------------------- The documentation says the 'cost' numbers are 'units of disk page fetches'. Do you, by any chance, have any notion of how many disk page fetches can be processed per second in practice - at least a rough idea? IOW how do I convert - guesstimate! - these numbers into (plausible) time values? Kind regards Harry Mantheakis London, UK
Thank you Claudio. > there are lots of ways in which it could be struggling... I have been monitoring the server with IOSTAT -d and IOSTAT -c and I cannot see anything alarming. > It will be a lot faster if you can drop all indices... This is counter-intuitive - because the WHERE clause is matching the only two indexed fields, and my understanding is that querying on indexed fields is faster than querying on fields that are not indexed. (Note also, that the indexed field is NOT being updated.) But if this update fails, I shall try what you suggest! Kind regards Harry Mantheakis London, UK
On Fri, Jun 24, 2011 at 1:19 PM, Harry Mantheakis <harry.mantheakis@riskcontrollimited.com> wrote: > >> there are lots of ways in which it could be struggling... > > I have been monitoring the server with IOSTAT -d and IOSTAT -c and I cannot > see anything alarming. If iostat doesn't show disk load, either iostat doesn't work well (which could be the case, I've found a few broken installations here and there), or, perhaps, your update is waiting on some other update. I've seen cases when there are application-level deadlocks (ie, deadlocks, but that the database alone cannot detect, and then your queries stall like that. It happens quite frequently if you try such a massive update on a loaded production server. In those cases, the technique someone mentioned (updating in smaller batches) usually works nicely. You should be able to see if it's locked waiting for something with "select * from pg_stat_activity".
24.06.11 14:16, Harry Mantheakis написав(ла): > > > EXPLAIN the statement > > Here is the EXPLAIN result: > > ---------------------------------------------------------------------- > QUERY PLAN > ---------------------------------------------------------------------- > Hash Join (cost=2589312.08..16596998.47 rows=74558048 width=63) > Hash Cond: (table_A.id = table_B.id) > -> Seq Scan on table_A(cost=0.00..1941825.05 rows=95612705 width=47) > -> Hash (cost=1220472.48..1220472.48 rows=74558048 width=20) > -> Seq Scan on table_B(cost=0.00..1220472.48 rows=74558048 width=20) > ---------------------------------------------------------------------- > > The documentation says the 'cost' numbers are 'units of disk page > fetches'. > > Do you, by any chance, have any notion of how many disk page fetches > can be processed per second in practice - at least a rough idea? > > IOW how do I convert - guesstimate! - these numbers into (plausible) > time values? No chance. This are "virtual values" for planner only. If I read correctly, your query should go into two phases: build hash map on one table, then update second table using the map. Not that this all valid unless you have any constraints (including foreign checks, both sides) to check on any field of updated table. If you have, you'd better drop them. Anyway, this is two seq. scans. For a long query I am using a tool like ktrace (freebsd) to get system read/write calls backend is doing. Then with catalog tables you can map file names to relations (tables/indexes). Then you can see which stage you are on and how fast is it doing. Note that partially cached tables are awful (in FreeBSD, dunno for linux) for such a query - I suppose this is because instead on sequential read, you get a lot of random reads that fools prefetch logic. "dd if=table_file of=/dev/null bs=8m" helps me a lot. You can see it it helps if CPU time goes up. Best regards, Vitalii Tymchyshyn
Thanks again Claudio. It does not look like its locked/blocked - but your hint about doing this task in smaller batches is a good one, and it would be easy enough to organise. I am going to let this task run over the week-end, and then decide. Either way, I shall update this thread. Much obliged! Harry Mantheakis London, UK On 24/06/2011 12:39, Claudio Freire wrote: > On Fri, Jun 24, 2011 at 1:19 PM, Harry Mantheakis > <harry.mantheakis@riskcontrollimited.com> wrote: >>> there are lots of ways in which it could be struggling... >> I have been monitoring the server with IOSTAT -d and IOSTAT -c and I cannot >> see anything alarming. > If iostat doesn't show disk load, either iostat doesn't work well > (which could be the case, I've found a few broken installations here > and there), or, perhaps, your update is waiting on some other update. > > I've seen cases when there are application-level deadlocks (ie, > deadlocks, but that the database alone cannot detect, and then your > queries stall like that. It happens quite frequently if you try such a > massive update on a loaded production server. In those cases, the > technique someone mentioned (updating in smaller batches) usually > works nicely. > > You should be able to see if it's locked waiting for something with > "select * from pg_stat_activity".
On 23/06/11 16:05, Harry Mantheakis wrote: > Hello > > I am attempting to run an update statement that copies two fields from > one table to another: > > > UPDATE > table_A > SET > ( > field_1 > , field_2 > ) = ( > table_B.field_1 > , table_B.field_2 > ) > FROM > table_B > WHERE > table_B.id = table_A.id > ; > I frequently get updates involving a FROM clause wrong --- the resulting table is correct but the running time is quadratic. You might want to try a series of smaller examples to see if your query displays this behaviour. Mark Thornton
Mark Thornton <mthornton@optrak.com> wrote: > On 23/06/11 16:05, Harry Mantheakis wrote: >> UPDATE >> table_A >> [ ... ] >> FROM >> table_B >> WHERE >> table_B.id = table_A.id > I frequently get updates involving a FROM clause wrong --- the > resulting table is correct but the running time is quadratic. The most frequent way I've seen that happen is for someone to do: UPDATE table_A [ ... ] FROM table_A a, table_B b WHERE b.id = a.id Because a FROM clause on an UPDATE statement is not in the standard, different products have implemented this differently. In Sybase ASE or Microsoft SQL Server you need to do the above to alias table_A, and the two references to table_A are treated as one. In PostgreSQL this would be two separate references and you would effectively be doing the full update of all rows in table_A once for every row in table_A. I don't think that is happening here based on the plan posted earlier in the thread. -Kevin
Harry Mantheakis <harry.mantheakis@riskcontrollimited.com> wrote: > IOW how do I convert - guesstimate! - these numbers into > (plausible) time values? They are abstract and it only matters that they are in the right ratios to one another so that the planner can accurately pick the cheapest plan. With the default settings, seq_page_cost is 1, so if everything is tuned perfectly, the run time should match the time it takes to sequentially read a number of pages (normally 8KB) which matches the estimated cost. So with 8KB pages and seq_page_cost = 1, the cost number says it should take the same amount of time as a sequential read of 130 GB. The biggest reason this won't be close to actual run time is that is that the planner just estimates the cost of *getting to* the correct tuples for update, implicitly assuming that the actual cost of the updates will be the same regardless of how you find the tuples to be updated. So if your costs were set in perfect proportion to reality, with seq_page_cost = 1, the above would tell you how fast a SELECT of the data to be updated should be. The cost numbers don't really give a clue about the time to actually UPDATE the rows. -Kevin
Harry Mantheakis <harry.mantheakis@riskcontrollimited.com> wrote: >> It will be a lot faster if you can drop all indices... > > This is counter-intuitive - because the WHERE clause is matching > the only two indexed fields, and my understanding is that querying > on indexed fields is faster than querying on fields that are not > indexed. Because your UPDATE requires reading every tuple in every page of both tables, it would be counter-productive to use the indexes. Random access is much slower than sequential, so it's fastest to just blast through both tables sequentially. The plan you showed has it scanning through table_B and loading the needed data into RAM in a hash table, then scanning through table_A and updating each row based on what is in the RAM hash table. For each row updated, if it isn't a HOT update, a new entry must be inserted into every index on table_A, so dropping the indexes before the update and re-creating them afterward would probably be a very good idea if you're going to do the whole table in one go, and possibly even if you're working in smaller chunks. One thing which might help run time a lot, especially since you mentioned having a lot of unused RAM, is to run the update with a very hight work_mem setting in the session running the UPDATE. > (Note also, that the indexed field is NOT being updated.) That's one of the conditions for a HOT update. The other is that there is space available on the same page for a new version of the row, in addition to the old version. -Kevin
Thank you so much for all the feedback, Kevin - much appreciated. I have stopped the all-in-one-go update from executing, and now I am executing a series of statements, each constrained to update no more than 100,000 records at a time. Interesting fact: updating 100,000 rows takes 5 seconds. Quick. I tried updating 1 million rows in one go, and the statement was still running after 25 minutes, before I killed it! So I am now executing an SQL script with almost 800 separate update-statements, each set to update 100K records, and the thing is trundling along nicely. (I am fortunate enough to be surrounded by MatLab users, one of whom generated the 800-statement file in one minute flat!) Many thanks again for all the info. Kind regards Harry Mantheakis London, UK
> try a series of smaller examples... Mark, that was the tip the saved me! Many thanks. Harry Mantheakis London, UK
Harry Mantheakis <harry.mantheakis@riskcontrollimited.com> wrote: > I have stopped the all-in-one-go update from executing, and now I > am executing a series of statements, each constrained to update no > more than 100,000 records at a time. > > Interesting fact: updating 100,000 rows takes 5 seconds. Quick. One last thing -- all these updates, included the aborted attempt at a single-pass update, may cause significant bloat in both the heap and the index(es). I usually finish up with a CLUSTER on the table, followed by a VACUUM ANALYZE on the table. -Kevin
I called VACUUM on the database after abandoning the original all-in-one update. And I have a daily cron script that executes the following statement: sudo -u postgres /usr/bin/vacuumdb -U postgres --all --analyze But I had not considered using CLUSTER - I will certainly look into that. Thanks again. Harry Mantheakis London, UK
I am glad to report that the 'salami-slice' approach worked nicely - all done in about 2.5 hours. Instead of using an all-in-one-go statement, we executed 800 statements, each updating 100,000 records. On average it tool about 10-seconds for each statement to return. This is "thinking out of the box" solution, which others might not be able to emulate. The mystery remains, for me: why updating 100,000 records could complete in as quickly as 5 seconds, whereas an attempt to update a million records was still running after 25 minutes before we killed it? One thing remains crystal clear: I love Postgresql :-) Kind regards Harry Mantheakis London, UK On 23/06/2011 16:05, Harry Mantheakis wrote: > Hello > > I am attempting to run an update statement that copies two fields from > one table to another: > > > UPDATE > table_A > SET > ( > field_1 > , field_2 > ) = ( > table_B.field_1 > , table_B.field_2 > ) > FROM > table_B > WHERE > table_B.id = table_A.id > ; > > > Table "table_B" contains almost 75 million records, with IDs that > match those in "table_A". > > Both "field_1" and "field_2" are DOUBLE PRECISION. The ID fields are > SERIAL primary-key integers in both tables. > > I tested (the logic of) this statement with a very small sample, and > it worked correctly. > > The database runs on a dedicated Debian server in our office. > > I called both VACUUM and ANALYZE on the databased before invoking this > statement. > > The statement has been running for 18+ hours so far. > > TOP, FREE and VMSTAT utilities indicate that only about half of the > 6GB of memory is being used, so I have no reason to believe that the > server is struggling. > > My question is: can I reasonably expect a statement like this to > complete with such a large data-set, even if it takes several days? > > We do not mind waiting, but obviously we do not want to wait > unnecessarily. > > Many thanks. > > Harry Mantheakis > London, UK > > >
Harry Mantheakis <harry.mantheakis@riskcontrollimited.com> wrote: > I am glad to report that the 'salami-slice' approach worked nicely > - all done in about 2.5 hours. Glad to hear it! > The mystery remains, for me: why updating 100,000 records could > complete in as quickly as 5 seconds, whereas an attempt to update > a million records was still running after 25 minutes before we > killed it? If you use EXPLAIN with both statements (without ANALYZE, since you probably don't want to trigger an actual *run* of the statement), is there a completely different plan for the range covering each? If so, a severe jump like that might mean that your costing parameters could use some adjustment, so that it switches from one plan to the other closer to the actual break-even point. > One thing remains crystal clear: I love Postgresql :-) :-) -Kevin
> The mystery remains, for me: why updating 100,000 records could complete > in as quickly as 5 seconds, whereas an attempt to update a million > records was still running after 25 minutes before we killed it? Hi, there's a lot of possible causes. Usually this is caused by a plan change - imagine for example that you need to sort a table and the amount of data just fits into work_mem, so that it can be sorted in memory. If you need to perform the same query with 10x the data, you'll have to sort the data on disk. Which is way slower, of course. And there are other such problems ... > One thing remains crystal clear: I love Postgresql :-) regards Tomas
On Mon, Jun 27, 2011 at 5:37 PM, <tv@fuzzy.cz> wrote: >> The mystery remains, for me: why updating 100,000 records could complete >> in as quickly as 5 seconds, whereas an attempt to update a million >> records was still running after 25 minutes before we killed it? > > Hi, there's a lot of possible causes. Usually this is caused by a plan > change - imagine for example that you need to sort a table and the amount > of data just fits into work_mem, so that it can be sorted in memory. If > you need to perform the same query with 10x the data, you'll have to sort > the data on disk. Which is way slower, of course. > > And there are other such problems ... I would rather assume it is one of the "other problems", typically related to handling the TX (e.g. checkpoints, WAL, creating copies of modified records and adjusting indexes...). Kind regards robert -- remember.guy do |as, often| as.you_can - without end http://blog.rubybestpractices.com/
Harry Mantheakis wrote: > The mystery remains, for me: why updating 100,000 records could > complete in as quickly as 5 seconds, whereas an attempt to update a > million records was still running after 25 minutes before we killed it? The way you were doing this originally, it was joining every record in table A against every record in table B, finding the matches (note the sequential scans on each in the query plan you showed). Having A * B possible matches there was using up a bunch of resources to line those two up for an efficient join, and it's possible that parts of that required spilling working data over to disk and other expensive operations. And you were guaranteeing that every row in each table was being processed in some way. Now, when you only took a small slice of A instead, and a small slice of B to match, this was likely using an index and working with a lot less rows in total--only ones in B that mattered were considered, not every one in B. And each processing slice was working on less rows, making it more likely to fit in memory, and thus avoiding both slow spill to disk operation and work that was less likely to fit into the system cache. I don't know exactly how much of each of these two components went into your large run-time difference, but I suspect both were involved. The way the optimizer switches to using a sequential scan when doing bulk operations is often the right move. But if it happens in a way that causes the set of data to be processed to become much larger than RAM, it can be a bad decision. The performance drop when things stop fitting in memory is not a slow one, it's like a giant cliff you fall off. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books
Hello Kevin > If you use EXPLAIN with both statements... Yes, the plans are indeed very different. Here is the statement, set to update up to 100,000 records, which took about 5 seconds to complete: UPDATE table_A SET field_1 = table_B.field_1 , field_2 = table_B.field_2 FROM table_B WHERE table_B.tb_id >= 0 AND table_B.tb_id <= 100000 AND table_B.tb_id = table_A.ta_id ; The query plan for the above is: Nested Loop (cost=0.00..2127044.47 rows=73620 width=63) -> Index Scan using table_B_pkey on table_B (cost=0.00..151830.75 rows=73620 width=20) Index Cond: ((tb_id >= 0) AND (tb_id <= 100000)) -> Index Scan using table_A_pkey on table_A (cost=0.00..26.82 rows=1 width=47) Index Cond: (table_A.ta_id = table_B.tb_id) Now, if I change the first AND clause to update 1M records, as follows: table_B.id <= 1000000 I get the following - quite different - query plan: Hash Join (cost=537057.49..8041177.88 rows=852150 width=63) Hash Cond: (table_A.ta_id = table_B.tb_id) -> Seq Scan on table_A (cost=0.00..3294347.71 rows=145561171 width=47) -> Hash (cost=521411.62..521411.62 rows=852150 width=20) -> Bitmap Heap Scan on table_B (cost=22454.78..521411.62 rows=852150 width=20) Recheck Cond: ((tb_id >= 0) AND (tb_id <= 1000000)) -> Bitmap Index Scan on table_B_pkey (cost=0.00..22241.74 rows=852150 width=0) Index Cond: ((tb_id >= 0) AND (tb_id <= 1000000)) Note: When I tried updating 1M records, the command was still running after 25 minutes before I killed it. The sequential scan in the later plan looks expensive, and (I think) supports what others have since mentioned, namely that when the optimizer moves to using sequential scans (working off the disk) things get a lot slower. For me, the penny has finally dropped on why I should use EXPLAIN for bulk operations. Thanks too, to Greg Smith, Robert Klemme and Thomas for all the feedback. Kind regards Harry Mantheakis London, UK