Re: Long Running Update - My Solution

Поиск
Список
Период
Сортировка
От Greg Smith
Тема Re: Long Running Update - My Solution
Дата
Msg-id 4E092234.30200@2ndquadrant.com
обсуждение исходный текст
Ответ на Re: Long Running Update - My Solution  (Harry Mantheakis <harry.mantheakis@riskcontrollimited.com>)
Список pgsql-performance
Harry Mantheakis wrote:
> The mystery remains, for me: why updating 100,000 records could
> complete in as quickly as 5 seconds, whereas an attempt to update a
> million records was still running after 25 minutes before we killed it?

The way you were doing this originally, it was joining every record in
table A against every record in table B, finding the matches (note the
sequential scans on each in the query plan you showed).  Having A * B
possible matches there was using up a bunch of resources to line those
two up for an efficient join, and it's possible that parts of that
required spilling working data over to disk and other expensive
operations.  And you were guaranteeing that every row in each table was
being processed in some way.

Now, when you only took a small slice of A instead, and a small slice of
B to match, this was likely using an index and working with a lot less
rows in total--only ones in B that mattered were considered, not every
one in B.  And each processing slice was working on less rows, making it
more likely to fit in memory, and thus avoiding both slow spill to disk
operation and work that was less likely to fit into the system cache.

I don't know exactly how much of each of these two components went into
your large run-time difference, but I suspect both were involved.  The
way the optimizer switches to using a sequential scan when doing bulk
operations is often the right move.  But if it happens in a way that
causes the set of data to be processed to become much larger than RAM,
it can be a bad decision.  The performance drop when things stop fitting
in memory is not a slow one, it's like a giant cliff you fall off.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


В списке pgsql-performance по дате отправления:

Предыдущее
От: Denis de Bernardy
Дата:
Сообщение: Fw: Getting rid of a seq scan in query on a large table
Следующее
От: Harry Mantheakis
Дата:
Сообщение: Re: Long Running Update - My Solution