Re: [GENERAL] PostgreSQL not reusing free space in table ?

Поиск
Список
Период
Сортировка
От Merlin Moncure
Тема Re: [GENERAL] PostgreSQL not reusing free space in table ?
Дата
Msg-id CAHyXU0y-oSE-2O-hux-OWVQ_k5TY2qB-RZzCxw_8a=Tft8=0Bg@mail.gmail.com
обсуждение исходный текст
Ответ на [GENERAL] PostgreSQL not reusing free space in table ?  (Pierre Ducroquet <pierre.ducroquet@people-doc.com>)
Ответы Re: [GENERAL] PostgreSQL not reusing free space in table ?
Список pgsql-general
On Fri, Jan 6, 2017 at 4:09 AM, Pierre Ducroquet
<pierre.ducroquet@people-doc.com> wrote:
> Hi
>
> Running PostgreSQL 9.4, I am running in the following issue.
> On a huge table, I have to remove the content (set to '') of a column that
> makes for 99% of the weight of the table. Since the production can not be
> stopped, I will not be able to do a VACUUM FULL nor any direct rewrite of the
> table, so I tried the following trick to drop the content while reducing the
> table weight. (I re-indented it for this email)
>
> ppr=500
> for i in `seq 0 60` ; do
>         psql $MYDB -c "UPDATE $TABLE
>                 SET content = ''::bytea
>                 WHERE ctid = ANY(ARRAY(
>                         SELECT ('(' || p.i || ',' || s.i || ')')::tid
>                         FROM generate_series(`$i * $ppr`, `($i + 1) * $ppr`) p(i),
>                                  generate_series(0, 2048) AS s(i)
>                 ));"
>         psql $MYDB -c "VACUUM $TABLE;"
> done
>
> This reduced my table from 1200MB down to 240MB, 29000 pages, while I expected
> the final result to be even lower. I did a copy of the table, and it needed
> only 30MB, 3800 pages, so there was still something wrong.
> I did a simple (and slow) query to get the number of rows per page on both
> table. The new one had slightly more rows per page, but this was in no way
> enough to justify such a gap. Then I saw that after page 2080 the table had,
> for the following 27000 pages, between 0 and 5 rows per page. So vacuum could
> not remove the pages because there were gaps.
> I figured out that, after a vacuum, updating the rows at the end of the table
> should be put back at the beginning of the free space and thus after an update
> of the rows of the last 27000 pages the table could be reduced to the optimal
> size.
> But when I raw the following, I was very surprised (NB: I know this query does
> a full scan, I won't do it on a 100GB table, I promise)
>
> VACUUM VERBOSE $TABLE;
>
> SELECT count(*) FROM $TABLE WHERE ctid > '(29000,0)';
> => 29 rows
> UPDATE $TABLE SET content = ''::bytea WHERE ctid > '(29000,0)';
> => 29 rows updated
> SELECT count(*) FROM $TABLE WHERE ctid > '(29000,0)';
> ==> 29 rows again !
>
> So instead of filling huge gaps (I've got for instance an 80 pages gap, from
> id 2082 to id 2160), Pg put the data back at the end of the table.
>
>
> I'm quite puzzled by this situation. I believed PostgreSQL would use the free
> space to put the new data, so my update loop would give me a clean, «packed»
> table.
>
> What behaviour did I miss here ? How can I get PostgreSQL to use that free
> space without falling back to a vacuum full ? (And without using tools like
> pg_repack if possible, because this seems like not needed in that situation)

hm.  Maybe HOT is involved?  Does your column have an index on it?

merlin


В списке pgsql-general по дате отправления:

Предыдущее
От: Adrian Klaver
Дата:
Сообщение: Re: [GENERAL] psql error (encoding related?)
Следующее
От: Merlin Moncure
Дата:
Сообщение: Re: [GENERAL] Write-optimized data structures