Re: pg_dump versus hash partitioning

Поиск

Список

Период

Сортировка

От	Robert Haas
Тема	Re: pg_dump versus hash partitioning
Дата	1 февраля 2023 г. 21:52:30
Msg-id	CA+TgmoavmdU6JFLHoRG02OpNBcKa1qAUMeSEC_tcvYQe9pyFuQ@mail.gmail.com обсуждение исходный текст
Ответ на	Re: pg_dump versus hash partitioning (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы	Re: pg_dump versus hash partitioning
Список	pgsql-hackers

Дерево обсуждения

On Wed, Feb 1, 2023 at 4:12 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > I don't think the fact that our *traditional* standard for how stable
> > a hash function needs to be has been XYZ carries any water.
>
> Well, it wouldn't need to if we had a practical way of changing the
> behavior of an existing hash function, but guess what: we don't.
> Andrew's original proposal for fixing this was exactly to change the
> behavior of hashenum().  There were some problems with the idea of
> depending on enumsortorder instead of enum OID, but the really
> fundamental issue is that you can't change hashing behavior without
> breaking pg_upgrade completely.  Not only will your hash indexes be
> corrupt, but your hash-partitioned tables will be broken, in exactly
> the same way that we're trying to solve for dump/reload cases (which
> of course will *also* be broken by redefining the hash function, if
> you didn't use --load-via-partition-root).  Moreover, while we can
> always advise people to reindex, there's no similarly easy way to fix
> broken partitioning.
>
> That being the case, I don't think moving the goalposts for hash
> function stability is going to lead to a workable solution.

I don't see that there is any easy, clean way to solve this in
released branches. The idea that I proposed could be implemented in
master, and I think it is the right kind of fix, but it is not
back-patchable. However, I think your argument rests on the premise
that making --load-via-partition-root the default behavior in some or
all cases will not break anything for anyone, and I'm skeptical. I
think that's a significant behavior change and that some people will
notice, and some will find it an improvement while others will find it
worse than the current behavior. I also think that there must be a lot
more people using partitioning in general, and even hash partitioning
specifically, than there are people using hash partitioning on an enum
column.

Personally, I would rather disallow this case in the back-branches --
i.e. make pg_dump barf if it is encountered and block CREATE TABLE
from setting up any new situations of this type -- than foist
--load-via-partition-root on many people who aren't affected by the
issue. I'm not saying that's a great answer, but we have to pick from
the choices we have.

I also don't accept that if someone has hit this issue they are just
hosed and there's no way out. Yeah, it's not a lot of fun: you
probably have to use "CREATE TABLE unpartitioned AS SELECT * FROM
borked; DROP TABLE borked;" or so to rescue your data. But what would
we do if we discovered that the btree opclass sorts 1 before 0, or
something? Surely we wouldn't refuse to fix the opclass just because
some users have existing indexes on disk that would be out of order
with the new opclass definition. We'd just change it and people would
have to deal. People with indexes would need to reindex. People with
partitioning boundaries between 0 and 1 would need to repartition.
This case isn't the same because hashenum() isn't broken in general,
just for this particular purpose. But I think you're trying to argue
that we should fix this by changing something other than the thing
that is broken, and I don't agree with that.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: pg_dump versus hash partitioning