Обсуждение: Long Running Update

Поиск
Список
Период
Сортировка

Long Running Update

От
Harry Mantheakis
Дата:
Hello

I am attempting to run an update statement that copies two fields from
one table to another:


UPDATE
   table_A
SET
(
   field_1
, field_2
) = (
table_B.field_1
, table_B.field_2
)
FROM
table_B
WHERE
table_B.id = table_A.id
;


Table "table_B" contains almost 75 million records, with IDs that match
those in "table_A".

Both "field_1" and "field_2" are DOUBLE PRECISION. The ID fields are
SERIAL primary-key integers in both tables.

I tested (the logic of) this statement with a very small sample, and it
worked correctly.

The database runs on a dedicated Debian server in our office.

I called both VACUUM and ANALYZE on the databased before invoking this
statement.

The statement has been running for 18+ hours so far.

TOP, FREE and VMSTAT utilities indicate that only about half of the 6GB
of memory is being used, so I have no reason to believe that the server
is struggling.

My question is: can I reasonably expect a statement like this to
complete with such a large data-set, even if it takes several days?

We do not mind waiting, but obviously we do not want to wait unnecessarily.

Many thanks.

Harry Mantheakis
London, UK



Re: Long Running Update

От
Claudio Freire
Дата:
On Thu, Jun 23, 2011 at 5:05 PM, Harry Mantheakis
<harry.mantheakis@riskcontrollimited.com> wrote:
> TOP, FREE and VMSTAT utilities indicate that only about half of the 6GB of
> memory is being used, so I have no reason to believe that the server is
> struggling.

You have a hinky idea of server load.

Mind you, there are lots of ways in which it could be struggling,
other than memory usage.
Like IO, CPU, lock contention...

In my experience, such huge updates struggle a lot with fsync and
random I/O when updating the indices.
It will be a lot faster if you can drop all indices (including the
PK), if you can.

Re: Long Running Update

От
"Kevin Grittner"
Дата:
Harry Mantheakis <harry.mantheakis@riskcontrollimited.com> wrote:

> UPDATE
>    table_A
> SET
> (
>    field_1
> , field_2
> ) = (
> table_B.field_1
> , table_B.field_2
> )
> FROM
> table_B
> WHERE
> table_B.id = table_A.id
> ;

I would have just done:

  SET field_1 = table_B.field_1, field_2 = table_B.field_2

instead of using row value constructors.  That might be slowing
things down a bit.

> I tested (the logic of) this statement with a very small sample,
> and it worked correctly.

Always a good sign.  :-)

> The statement has been running for 18+ hours so far.

> My question is: can I reasonably expect a statement like this to
> complete with such a large data-set, even if it takes several
> days?

If it's not leaking memory, I expect that it will complete.

To get some sense of what it's doing, you could log on to another
connection and EXPLAIN the statement.  (NOTE: Be careful *not* to
use EXPLAIN ANALYZE.)

Another thing to consider if you run something like this again is
that an UPDATE is an awful lot like an INSERT combined with a
DELETE.  The way PostgreSQL MVCC works, the old version of each row
must remain until the updating transaction completes.  If you were
to divide this update into a series of updates by key range, the new
versions of the rows from later updates could re-use the space
previously occupied by the old version of rows from earlier updates.
For similar reasons, you might want to add something like this to
your WHERE clause, to prevent unnecessary updates:

  AND (table_B.field_1 IS DISTINCT FROM table_A.field_1
    OR table_B.field_2 IS DISTINCT FROM table_A.field_2);

-Kevin

Re: Long Running Update

От
Harry Mantheakis
Дата:
Thank you Kevin.

 > SET field_1 = table_B.field_1, field_2 = table_B.field_2

I will try that, if I have to next time.

 > add something like this toyour WHERE clause,
 > to prevent unnecessary updates:
 >
 >   AND (table_B.field_1 IS DISTINCT FROM table_A.field_1
 >    OR table_B.field_2 IS DISTINCT FROM table_A.field_2);

Thank you for that explanation - I will keep that in mind in future. (In
this case, the two fields that are being updated are all known to be
empty - hence, distinct - in the target table.)

 > EXPLAIN the statement

Here is the EXPLAIN result:

----------------------------------------------------------------------
QUERY PLAN
----------------------------------------------------------------------
Hash Join  (cost=2589312.08..16596998.47 rows=74558048 width=63)
Hash Cond: (table_A.id = table_B.id)
->  Seq Scan on table_A(cost=0.00..1941825.05 rows=95612705 width=47)
->  Hash  (cost=1220472.48..1220472.48 rows=74558048 width=20)
->  Seq Scan on table_B(cost=0.00..1220472.48 rows=74558048 width=20)
----------------------------------------------------------------------

The documentation says the 'cost' numbers are 'units of disk page fetches'.

Do you, by any chance, have any notion of how many disk page fetches can
be processed per second in practice - at least a rough idea?

IOW how do I convert - guesstimate! - these numbers into (plausible)
time values?

Kind regards

Harry Mantheakis
London, UK


Re: Long Running Update

От
Harry Mantheakis
Дата:
Thank you Claudio.

 > there are lots of ways in which it could be struggling...

I have been monitoring the server with IOSTAT -d and IOSTAT -c and I
cannot see anything alarming.

 > It will be a lot faster if you can drop all indices...

This is counter-intuitive - because the WHERE clause is matching the
only two indexed fields, and my understanding is that querying on
indexed fields is faster than querying on fields that are not indexed.
(Note also, that the indexed field is NOT being updated.)

But if this update fails, I shall try what you suggest!

Kind regards

Harry Mantheakis
London, UK


Re: Long Running Update

От
Claudio Freire
Дата:
On Fri, Jun 24, 2011 at 1:19 PM, Harry Mantheakis
<harry.mantheakis@riskcontrollimited.com> wrote:
>
>> there are lots of ways in which it could be struggling...
>
> I have been monitoring the server with IOSTAT -d and IOSTAT -c and I cannot
> see anything alarming.

If iostat doesn't show disk load, either iostat doesn't work well
(which could be the case, I've found a few broken installations here
and there), or, perhaps, your update is waiting on some other update.

I've seen cases when there are application-level deadlocks (ie,
deadlocks, but that the database alone cannot detect, and then your
queries stall like that. It happens quite frequently if you try such a
massive update on a loaded production server. In those cases, the
technique someone mentioned (updating in smaller batches) usually
works nicely.

You should be able to see if it's locked waiting for something with
"select * from pg_stat_activity".

Re: Long Running Update

От
Vitalii Tymchyshyn
Дата:
24.06.11 14:16, Harry Mantheakis написав(ла):
>
> > EXPLAIN the statement
>
> Here is the EXPLAIN result:
>
> ----------------------------------------------------------------------
> QUERY PLAN
> ----------------------------------------------------------------------
> Hash Join (cost=2589312.08..16596998.47 rows=74558048 width=63)
> Hash Cond: (table_A.id = table_B.id)
> -> Seq Scan on table_A(cost=0.00..1941825.05 rows=95612705 width=47)
> -> Hash (cost=1220472.48..1220472.48 rows=74558048 width=20)
> -> Seq Scan on table_B(cost=0.00..1220472.48 rows=74558048 width=20)
> ----------------------------------------------------------------------
>
> The documentation says the 'cost' numbers are 'units of disk page
> fetches'.
>
> Do you, by any chance, have any notion of how many disk page fetches
> can be processed per second in practice - at least a rough idea?
>
> IOW how do I convert - guesstimate! - these numbers into (plausible)
> time values?
No chance. This are "virtual values" for planner only.
If I read correctly, your query should go into two phases: build hash
map on one table, then update second table using the map. Not that this
all valid unless you have any constraints (including foreign checks,
both sides) to check on any field of updated table. If you have, you'd
better drop them.
Anyway, this is two seq. scans. For a long query I am using a tool like
ktrace (freebsd) to get system read/write calls backend is doing. Then
with catalog tables you can map file names to relations
(tables/indexes). Then you can see which stage you are on and how fast
is it doing.
Note that partially cached tables are awful (in FreeBSD, dunno for
linux) for such a query - I suppose this is because instead on
sequential read, you get a lot of random reads that fools prefetch
logic. "dd if=table_file of=/dev/null bs=8m" helps me a lot. You can see
it it helps if CPU time goes up.

Best regards, Vitalii Tymchyshyn

Re: Long Running Update

От
Harry Mantheakis
Дата:
Thanks again Claudio.

It does not look like its locked/blocked - but your hint about doing
this task in smaller batches is a good one, and it would be easy enough
to organise.

I am going to let this task run over the week-end, and then decide.
Either way, I shall update this thread.

Much obliged!

Harry Mantheakis
London, UK


On 24/06/2011 12:39, Claudio Freire wrote:
> On Fri, Jun 24, 2011 at 1:19 PM, Harry Mantheakis
> <harry.mantheakis@riskcontrollimited.com>  wrote:
>>> there are lots of ways in which it could be struggling...
>> I have been monitoring the server with IOSTAT -d and IOSTAT -c and I cannot
>> see anything alarming.
> If iostat doesn't show disk load, either iostat doesn't work well
> (which could be the case, I've found a few broken installations here
> and there), or, perhaps, your update is waiting on some other update.
>
> I've seen cases when there are application-level deadlocks (ie,
> deadlocks, but that the database alone cannot detect, and then your
> queries stall like that. It happens quite frequently if you try such a
> massive update on a loaded production server. In those cases, the
> technique someone mentioned (updating in smaller batches) usually
> works nicely.
>
> You should be able to see if it's locked waiting for something with
> "select * from pg_stat_activity".

Re: Long Running Update

От
Mark Thornton
Дата:
On 23/06/11 16:05, Harry Mantheakis wrote:
> Hello
>
> I am attempting to run an update statement that copies two fields from
> one table to another:
>
>
> UPDATE
>   table_A
> SET
> (
>   field_1
> , field_2
> ) = (
> table_B.field_1
> , table_B.field_2
> )
> FROM
> table_B
> WHERE
> table_B.id = table_A.id
> ;
>

I frequently get updates involving a FROM clause wrong --- the resulting
table is correct but the running time is quadratic. You might want to
try a series of smaller examples to see if your query displays this
behaviour.

Mark Thornton


Re: Long Running Update

От
"Kevin Grittner"
Дата:
Mark Thornton <mthornton@optrak.com> wrote:
> On 23/06/11 16:05, Harry Mantheakis wrote:

>> UPDATE
>>   table_A
>> [ ... ]
>> FROM
>> table_B
>> WHERE
>> table_B.id = table_A.id

> I frequently get updates involving a FROM clause wrong --- the
> resulting table is correct but the running time is quadratic.

The most frequent way I've seen that happen is for someone to do:

UPDATE table_A
  [ ... ]
  FROM table_A a, table_B b
  WHERE b.id = a.id

Because a FROM clause on an UPDATE statement is not in the standard,
different products have implemented this differently.  In Sybase ASE
or Microsoft SQL Server you need to do the above to alias table_A,
and the two references to table_A are treated as one.  In PostgreSQL
this would be two separate references and you would effectively be
doing the full update of all rows in table_A once for every row in
table_A.  I don't think that is happening here based on the plan
posted earlier in the thread.

-Kevin

Re: Long Running Update

От
"Kevin Grittner"
Дата:
Harry Mantheakis <harry.mantheakis@riskcontrollimited.com> wrote:

> IOW how do I convert - guesstimate! - these numbers into
> (plausible) time values?

They are abstract and it only matters that they are in the right
ratios to one another so that the planner can accurately pick the
cheapest plan.  With the default settings, seq_page_cost is 1, so if
everything is tuned perfectly, the run time should match the time it
takes to sequentially read a number of pages (normally 8KB) which
matches the estimated cost.  So with 8KB pages and seq_page_cost =
1, the cost number says it should take the same amount of time as a
sequential read of 130 GB.

The biggest reason this won't be close to actual run time is that is
that the planner just estimates the cost of *getting to* the correct
tuples for update, implicitly assuming that the actual cost of the
updates will be the same regardless of how you find the tuples to be
updated.  So if your costs were set in perfect proportion to
reality, with seq_page_cost = 1, the above would tell you how fast a
SELECT of the data to be updated should be.  The cost numbers don't
really give a clue about the time to actually UPDATE the rows.

-Kevin

Re: Long Running Update

От
"Kevin Grittner"
Дата:
Harry Mantheakis <harry.mantheakis@riskcontrollimited.com> wrote:

>> It will be a lot faster if you can drop all indices...
>
> This is counter-intuitive - because the WHERE clause is matching
> the only two indexed fields, and my understanding is that querying
> on indexed fields is faster than querying on fields that are not
> indexed.

Because your UPDATE requires reading every tuple in every page of
both tables, it would be counter-productive to use the indexes.
Random access is much slower than sequential, so it's fastest to
just blast through both tables sequentially.  The plan you showed
has it scanning through table_B and loading the needed data into RAM
in a hash table, then scanning through table_A and updating each row
based on what is in the RAM hash table.

For each row updated, if it isn't a HOT update, a new entry must be
inserted into every index on table_A, so dropping the indexes before
the update and re-creating them afterward would probably be a very
good idea if you're going to do the whole table in one go, and
possibly even if you're working in smaller chunks.

One thing which might help run time a lot, especially since you
mentioned having a lot of unused RAM, is to run the update with a
very hight work_mem setting in the session running the UPDATE.

> (Note also, that the indexed field is NOT being updated.)

That's one of the conditions for a HOT update.  The other is that
there is space available on the same page for a new version of the
row, in addition to the old version.

-Kevin

Re: Long Running Update

От
Harry Mantheakis
Дата:
Thank you so much for all the feedback, Kevin - much appreciated.

I have stopped the all-in-one-go update from executing, and now I am
executing a series of statements, each constrained to update no more
than 100,000 records at a time.

Interesting fact: updating 100,000 rows takes 5 seconds. Quick.

I tried updating 1 million rows in one go, and the statement was still
running after 25 minutes, before I killed it!

So I am now executing an SQL script with almost 800 separate
update-statements, each set to update 100K records, and the thing is
trundling along nicely. (I am fortunate enough to be surrounded by
MatLab users, one of whom generated the 800-statement file in one minute
flat!)

Many thanks again for all the info.

Kind regards

Harry Mantheakis
London, UK


Re: Long Running Update

От
Harry Mantheakis
Дата:
 > try a series of smaller examples...

Mark, that was the tip the saved me!

Many thanks.

Harry Mantheakis
London, UK


Re: Long Running Update

От
"Kevin Grittner"
Дата:
Harry Mantheakis <harry.mantheakis@riskcontrollimited.com> wrote:

> I have stopped the all-in-one-go update from executing, and now I
> am executing a series of statements, each constrained to update no
> more than 100,000 records at a time.
>
> Interesting fact: updating 100,000 rows takes 5 seconds. Quick.

One last thing -- all these updates, included the aborted attempt at
a single-pass update, may cause significant bloat in both the heap
and the index(es).  I usually finish up with a CLUSTER on the table,
followed by a VACUUM ANALYZE on the table.

-Kevin

Re: Long Running Update

От
Harry Mantheakis
Дата:
I called VACUUM on the database after abandoning the original all-in-one
update.

And I have a daily cron script that executes the following statement:

sudo -u postgres /usr/bin/vacuumdb -U postgres --all --analyze

But I had not considered using CLUSTER - I will certainly look into that.

Thanks again.

Harry Mantheakis
London, UK


Re: Long Running Update - My Solution

От
Harry Mantheakis
Дата:
I am glad to report that the 'salami-slice' approach worked nicely - all
done in about 2.5 hours.

Instead of using an all-in-one-go statement, we executed 800 statements,
each updating 100,000 records. On average it tool about 10-seconds for
each statement to return.

This is "thinking out of the box" solution, which others might not be
able to emulate.

The mystery remains, for me: why updating 100,000 records could complete
in as quickly as 5 seconds, whereas an attempt to update a million
records was still running after 25 minutes before we killed it?

One thing remains crystal clear: I love Postgresql :-)

Kind regards

Harry Mantheakis
London, UK


On 23/06/2011 16:05, Harry Mantheakis wrote:
> Hello
>
> I am attempting to run an update statement that copies two fields from
> one table to another:
>
>
> UPDATE
>   table_A
> SET
> (
>   field_1
> , field_2
> ) = (
> table_B.field_1
> , table_B.field_2
> )
> FROM
> table_B
> WHERE
> table_B.id = table_A.id
> ;
>
>
> Table "table_B" contains almost 75 million records, with IDs that
> match those in "table_A".
>
> Both "field_1" and "field_2" are DOUBLE PRECISION. The ID fields are
> SERIAL primary-key integers in both tables.
>
> I tested (the logic of) this statement with a very small sample, and
> it worked correctly.
>
> The database runs on a dedicated Debian server in our office.
>
> I called both VACUUM and ANALYZE on the databased before invoking this
> statement.
>
> The statement has been running for 18+ hours so far.
>
> TOP, FREE and VMSTAT utilities indicate that only about half of the
> 6GB of memory is being used, so I have no reason to believe that the
> server is struggling.
>
> My question is: can I reasonably expect a statement like this to
> complete with such a large data-set, even if it takes several days?
>
> We do not mind waiting, but obviously we do not want to wait
> unnecessarily.
>
> Many thanks.
>
> Harry Mantheakis
> London, UK
>
>
>

Re: Long Running Update - My Solution

От
"Kevin Grittner"
Дата:
Harry Mantheakis <harry.mantheakis@riskcontrollimited.com> wrote:

> I am glad to report that the 'salami-slice' approach worked nicely
> - all done in about 2.5 hours.

Glad to hear it!

> The mystery remains, for me: why updating 100,000 records could
> complete in as quickly as 5 seconds, whereas an attempt to update
> a million records was still running after 25 minutes before we
> killed it?

If you use EXPLAIN with both statements (without ANALYZE, since you
probably don't want to trigger an actual *run* of the statement), is
there a completely different plan for the range covering each?  If
so, a severe jump like that might mean that your costing parameters
could use some adjustment, so that it switches from one plan to the
other closer to the actual break-even point.

> One thing remains crystal clear: I love Postgresql :-)

:-)

-Kevin

Re: Long Running Update - My Solution

От
tv@fuzzy.cz
Дата:
> The mystery remains, for me: why updating 100,000 records could complete
> in as quickly as 5 seconds, whereas an attempt to update a million
> records was still running after 25 minutes before we killed it?

Hi, there's a lot of possible causes. Usually this is caused by a plan
change - imagine for example that you need to sort a table and the amount
of data just fits into work_mem, so that it can be sorted in memory. If
you need to perform the same query with 10x the data, you'll have to sort
the data on disk. Which is way slower, of course.

And there are other such problems ...

> One thing remains crystal clear: I love Postgresql :-)

regards
Tomas


Re: Long Running Update - My Solution

От
Robert Klemme
Дата:
On Mon, Jun 27, 2011 at 5:37 PM,  <tv@fuzzy.cz> wrote:
>> The mystery remains, for me: why updating 100,000 records could complete
>> in as quickly as 5 seconds, whereas an attempt to update a million
>> records was still running after 25 minutes before we killed it?
>
> Hi, there's a lot of possible causes. Usually this is caused by a plan
> change - imagine for example that you need to sort a table and the amount
> of data just fits into work_mem, so that it can be sorted in memory. If
> you need to perform the same query with 10x the data, you'll have to sort
> the data on disk. Which is way slower, of course.
>
> And there are other such problems ...

I would rather assume it is one of the "other problems", typically
related to handling the TX (e.g. checkpoints, WAL, creating copies of
modified records and adjusting indexes...).

Kind regards

robert


--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Re: Long Running Update - My Solution

От
Greg Smith
Дата:
Harry Mantheakis wrote:
> The mystery remains, for me: why updating 100,000 records could
> complete in as quickly as 5 seconds, whereas an attempt to update a
> million records was still running after 25 minutes before we killed it?

The way you were doing this originally, it was joining every record in
table A against every record in table B, finding the matches (note the
sequential scans on each in the query plan you showed).  Having A * B
possible matches there was using up a bunch of resources to line those
two up for an efficient join, and it's possible that parts of that
required spilling working data over to disk and other expensive
operations.  And you were guaranteeing that every row in each table was
being processed in some way.

Now, when you only took a small slice of A instead, and a small slice of
B to match, this was likely using an index and working with a lot less
rows in total--only ones in B that mattered were considered, not every
one in B.  And each processing slice was working on less rows, making it
more likely to fit in memory, and thus avoiding both slow spill to disk
operation and work that was less likely to fit into the system cache.

I don't know exactly how much of each of these two components went into
your large run-time difference, but I suspect both were involved.  The
way the optimizer switches to using a sequential scan when doing bulk
operations is often the right move.  But if it happens in a way that
causes the set of data to be processed to become much larger than RAM,
it can be a bad decision.  The performance drop when things stop fitting
in memory is not a slow one, it's like a giant cliff you fall off.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


Re: Long Running Update - My Solution

От
Harry Mantheakis
Дата:
Hello Kevin

 > If you use EXPLAIN with both statements...

Yes, the plans are indeed very different.

Here is the statement, set to update up to 100,000 records, which took
about 5 seconds to complete:


UPDATE
   table_A
SET
   field_1 = table_B.field_1
, field_2 = table_B.field_2
FROM
   table_B
WHERE
   table_B.tb_id >= 0
AND
   table_B.tb_id <= 100000
AND
   table_B.tb_id = table_A.ta_id
;


The query plan for the above is:


Nested Loop  (cost=0.00..2127044.47 rows=73620 width=63)
   ->  Index Scan using table_B_pkey on table_B  (cost=0.00..151830.75
rows=73620 width=20)
         Index Cond: ((tb_id >= 0) AND (tb_id <= 100000))
   ->  Index Scan using table_A_pkey on table_A  (cost=0.00..26.82
rows=1 width=47)
         Index Cond: (table_A.ta_id = table_B.tb_id)


Now, if I change the first AND clause to update 1M records, as follows:


table_B.id <= 1000000


I get the following - quite different - query plan:


Hash Join  (cost=537057.49..8041177.88 rows=852150 width=63)
   Hash Cond: (table_A.ta_id = table_B.tb_id)
   ->  Seq Scan on table_A  (cost=0.00..3294347.71 rows=145561171 width=47)
   ->  Hash  (cost=521411.62..521411.62 rows=852150 width=20)
         ->  Bitmap Heap Scan on table_B  (cost=22454.78..521411.62
rows=852150 width=20)
               Recheck Cond: ((tb_id >= 0) AND (tb_id <= 1000000))
               ->  Bitmap Index Scan on table_B_pkey
(cost=0.00..22241.74 rows=852150 width=0)
                     Index Cond: ((tb_id >= 0) AND (tb_id <= 1000000))


Note: When I tried updating 1M records, the command was still running
after 25 minutes before I killed it.

The sequential scan in the later plan looks expensive, and (I think)
supports what others have since mentioned, namely that when the
optimizer moves to using sequential scans (working off the disk) things
get a lot slower.

For me, the penny has finally dropped on why I should use EXPLAIN for
bulk operations.

Thanks too, to Greg Smith, Robert Klemme and Thomas for all the feedback.

Kind regards

Harry Mantheakis
London, UK