Re: enhance the efficiency of migrating particularly large tables

Поиск
Список
Период
Сортировка
От David Rowley
Тема Re: enhance the efficiency of migrating particularly large tables
Дата
Msg-id CAApHDvr=FLi63sPDZUokKtC094EcOz_suGsLgmNJs2U+WkysRA@mail.gmail.com
обсуждение исходный текст
Ответ на [MASSMAIL]enhance the efficiency of migrating particularly large tables  (David Zhang <david.zhang@highgo.ca>)
Ответы Re: enhance the efficiency of migrating particularly large tables  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: enhance the efficiency of migrating particularly large tables  (David Zhang <david.zhang@highgo.ca>)
Список pgsql-hackers
On Tue, 9 Apr 2024 at 09:52, David Zhang <david.zhang@highgo.ca> wrote:
> However, when executing SELECT min(ctid) and max(ctid), it performs a
> Seq Scan, which can be slow for a large table. Is there a way to
> retrieve the minimum and maximum ctid other than using the system
> functions min() and max()?

Finding the exact ctid seems overkill for what you need.  Why you
could just find the maximum block with:

N = pg_relation_size('name_of_your_table'::regclass) /
current_Setting('block_size')::int;

and do WHERE ctid < '(N,1)';

If we wanted to optimise this in PostgreSQL, the way to do it would
be, around set_plain_rel_pathlist(), check if the relation's ctid is a
required PathKey by the same means as create_index_paths() does, then
if found, create another seqscan path without synchronize_seqscans *
and tag that with the ctid PathKey sending the scan direction
according to the PathKey direction. nulls_first does not matter since
ctid cannot be NULL.

Min(ctid) query should be able to make use of this as the planner
should rewrite those to subqueries with a ORDER BY ctid LIMIT 1.

* We'd need to invent an actual Path type for SeqScanPath as I see
create_seqscan_path() just uses the base struct Path.
synchronize_seqscans would have to become a property of that new Path
type and it would need to be carried forward into the plan and looked
at in the executor so that we always start a scan at the first or last
block.

Unsure if such a feature is worthwhile. I think maybe not for just
min(ctid)/max(ctid). However, there could be other reasons, such as
the transform OR to UNION stuff that Tom worked on a few years ago.
That needed to eliminate duplicate rows that matched both OR branches
and that was done using ctid.

David



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andrew Dunstan
Дата:
Сообщение: Re: WIP Incremental JSON Parser
Следующее
От: Tomas Vondra
Дата:
Сообщение: Re: PostgreSQL 17 Release Management Team & Feature Freeze