Re: Disabling Heap-Only Tuples

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: Disabling Heap-Only Tuples
Дата
Msg-id 20230919185540.cddpceq3cdqsoxx4@awork3.anarazel.de
обсуждение исходный текст
Ответ на Re: Disabling Heap-Only Tuples  (Matthias van de Meent <boekewurm+postgres@gmail.com>)
Список pgsql-hackers
Hi,

On 2023-09-19 19:33:22 +0200, Matthias van de Meent wrote:
> On Tue, 19 Sept 2023 at 18:56, Andres Freund <andres@anarazel.de> wrote:
> >
> > Hi,
> >
> > On 2023-09-19 18:30:44 +0200, Alvaro Herrera wrote:
> > > This makes me think that maybe the logic needs to be a little more
> > > complex to avoid the problem you describe: if an UPDATE is prevented
> > > from being HOT because of this setting, but then it goes and consults
> > > FSM and it gives the update a higher block number than the tuple's
> > > current block (or it fails to give a block number at all so it is forced
> > > to extend the relation), then the update should give up on that strategy
> > > and use a HOT update after all.  (I have not read the actual patch;
> > > maybe it already does this?  It sounds kinda obvious.)
> >
> > Yea, a setting like what's discussed here seems, uh, not particularly useful
> > for achieving the goal of compacting tables.  I don't think guiding this
> > through SQL makes a lot of sense. For decent compaction you'd want to scan the
> > table backwards, and move rows from the end to earlier, but stop once
> > everything is filled up. You can somewhat do that from SQL, but it's going to
> > be awkward and slow.  I doubt you even want to use the normal UPDATE WAL
> > logging.
>
> We can't move tuples around (or, not that I know of) without using a
> transaction ID to control the visibility of the two locations of that
> tuple.

Correct, otherwise you'd end up with broken visibility in scans (seeing the
same tuple twice or never).


> Doing table compaction would thus likely require using transactions to move
> these tuples around.

Yes - but I don't think that has to be a problem. I'd expect something like
this to use multiple transactions internally. Possibly optimizing xid usage by
checking if other transactions are currently waiting on the xid and committing
if that's the case. Processing a single page should be quite fast, so the
maximum delay on other sessions is quite small.


> Using a single backend and bulk operations, it'll still lock each tuple that
> is being moved, and that can be noticed by user DML queries. I'd rather make
> the user's queries move the data around than this long-duration, locking
> background operation.

I doubt that works well enough in practice. It's very common to have tuples
that aren't updated after some point. So you then end up with needing tooling
that triggers UPDATEs for tuples at the end of the relation.


> > I think having explicit compaction support in VACUUM or somewhere similar
> > would make sense, but I don't think the proposed GUC is a useful stepping
> > stone.
>
> The point of this GUC is that the compaction can happen organically in
> the user's UPDATE workflow, so that there is no long locking operation
> going on (as you would see with VACUUM FULL / CLUSTER / pg_repack).

It certainly shouldn't use an AEL. I think we could even get away without an
SUE (it's basically just UPDATEs after all), but whether it's worth doing that
I'm not sure.


> > > > But without any kind of auto-tuning, in my opinion, it's a fairly poor
> > > > feature. Sure, some people will get use out of it, if they're
> > > > sufficiently knowledgeable and sufficiently determined. But I think
> > > > for most people in most situations, it will be a struggle.
> >
> > Indeed. I think it'd often just explode table and index sizes, because HOT
> > pruning won't be able to make usable space in pages anymore (due to dead
> > items).
>
> You seem to misunderstand the latest patch. It explicitly only blocks
> local updates if the update can then move the new tuple to an earlier
> page. If that is not possible, then it'll insert locally (assuming
> that is still possible) and HOT can then still apply.

I indeed apparently had looked at the wrong patch. But I still don't think
this is a useful way of controlling this.  I guess it could be a small part of
something larger, but you are going to need something that actively updates
tuples at the end of the table, otherwise it's very unlikely in practice that
you'll ever be able to shrink the table.


Leaving aside what process "moves" tuples, I doubt that controlling "moving"
via the table size is useful. Controlling via the amount free space in the FSM
would make more sense. If there's no known free space in the FSM, this
approach can't compact. Using the table size to control also means that the
value needs to be updated with the growth of the table. Whereas controlling
moving via a percentage of free space in the FSM would allow the same setting
to be used even for a growing (or shrinking) table.

Greetings,

Andres Freund



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: Disabling Heap-Only Tuples
Следующее
От: Tomas Vondra
Дата:
Сообщение: Re: dikkop seems unhappy because of openssl stuff (FreeBSD 14-BETA1)