Re: making relfilenodes 56 bits

Поиск
Список
Период
Сортировка
От Dilip Kumar
Тема Re: making relfilenodes 56 bits
Дата
Msg-id CAFiTN-tVnrcgdXCo_XgHja8=JqcqyssWV7uVdkEUTkHo7TiyBQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: making relfilenodes 56 bits  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: making relfilenodes 56 bits  (Dilip Kumar <dilipbalaut@gmail.com>)
Список pgsql-hackers
On Tue, Aug 23, 2022 at 8:00 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Tue, Aug 23, 2022 at 2:06 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > OTOH, if we keep the two separate ranges for the user and system table
> > then we don't need all this complex logic of conflict checking.
>
> True. That's the downside. The question is whether it's worth adding
> some complexity to avoid needing separate ranges.

Other than complexity, we will have to check the conflict for all the
user table's relfilenumber from the old cluster into the hash build on
the new cluster's relfilenumber, isn't it extra overhead if there are
a lot of user tables?  But I think we are already restoring all those
tables in the new cluster so compared to that it will be very small.

> Honestly, if we don't care about having separate ranges, we can do
> something even simpler and just make the starting relfilenumber for
> system tables same as the OID. Then we don't have to do anything at
> all, outside of not changing the OID assigned to pg_largeobject in a
> future release. Then as long as pg_upgrade is targeting a new cluster
> with completely fresh databases that have not had any system table
> rewrites so far, there can't be any conflict.

I think having the OID-based system and having two ranges are not
exactly the same.  Because if we have the OID-based relfilenumber
allocation for system table (initially) and then later allocate from
the nextRelFileNumber counter then it seems like a mix of old system
(where actually OID and relfilenumber are tightly connected) and the
new system where nextRelFileNumber is completely independent counter.
OTOH having two ranges means logically we are not making dependent on
OID we are just allocating from a central counter but after catalog
initialization, we will leave some gap and start from a new range. So
I don't think this system is hard to explain.

> And perhaps that is the best solution after all, but while it is
> simple in terms of code, I feel it's a bit complicated for human
> beings. It's very simple to understand the scheme that Amit proposed:
> if there's anything in the new cluster that would conflict, we move it
> out of the way. We don't have to assume the new cluster hasn't had any
> table rewrites. We don't have to nail down starting relfilenumber
> assignments for system tables. We don't have to worry about
> relfilenumber or OID assignments changing between releases.
> pg_largeobject is not a special case. There are no special ranges of
> OIDs or relfilenumbers required. It just straight up works -- all the
> time, no matter what, end of story.

I agree on this that this system is easy to explain that we just
rewrite anything that conflicts so looks more future-proof.  Okay, I
will try this solution and post the patch.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: Re: Perform streaming logical transactions by background workers and parallel apply
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work