Re: pg_dump versus hash partitioning

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: pg_dump versus hash partitioning
Дата
Msg-id CA+TgmoavmdU6JFLHoRG02OpNBcKa1qAUMeSEC_tcvYQe9pyFuQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: pg_dump versus hash partitioning  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: pg_dump versus hash partitioning  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On Wed, Feb 1, 2023 at 4:12 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > I don't think the fact that our *traditional* standard for how stable
> > a hash function needs to be has been XYZ carries any water.
>
> Well, it wouldn't need to if we had a practical way of changing the
> behavior of an existing hash function, but guess what: we don't.
> Andrew's original proposal for fixing this was exactly to change the
> behavior of hashenum().  There were some problems with the idea of
> depending on enumsortorder instead of enum OID, but the really
> fundamental issue is that you can't change hashing behavior without
> breaking pg_upgrade completely.  Not only will your hash indexes be
> corrupt, but your hash-partitioned tables will be broken, in exactly
> the same way that we're trying to solve for dump/reload cases (which
> of course will *also* be broken by redefining the hash function, if
> you didn't use --load-via-partition-root).  Moreover, while we can
> always advise people to reindex, there's no similarly easy way to fix
> broken partitioning.
>
> That being the case, I don't think moving the goalposts for hash
> function stability is going to lead to a workable solution.

I don't see that there is any easy, clean way to solve this in
released branches. The idea that I proposed could be implemented in
master, and I think it is the right kind of fix, but it is not
back-patchable. However, I think your argument rests on the premise
that making --load-via-partition-root the default behavior in some or
all cases will not break anything for anyone, and I'm skeptical. I
think that's a significant behavior change and that some people will
notice, and some will find it an improvement while others will find it
worse than the current behavior. I also think that there must be a lot
more people using partitioning in general, and even hash partitioning
specifically, than there are people using hash partitioning on an enum
column.

Personally, I would rather disallow this case in the back-branches --
i.e. make pg_dump barf if it is encountered and block CREATE TABLE
from setting up any new situations of this type -- than foist
--load-via-partition-root on many people who aren't affected by the
issue. I'm not saying that's a great answer, but we have to pick from
the choices we have.

I also don't accept that if someone has hit this issue they are just
hosed and there's no way out. Yeah, it's not a lot of fun: you
probably have to use "CREATE TABLE unpartitioned AS SELECT * FROM
borked; DROP TABLE borked;" or so to rescue your data. But what would
we do if we discovered that the btree opclass sorts 1 before 0, or
something? Surely we wouldn't refuse to fix the opclass just because
some users have existing indexes on disk that would be out of order
with the new opclass definition. We'd just change it and people would
have to deal. People with indexes would need to reindex. People with
partitioning boundaries between 0 and 1 would need to repartition.
This case isn't the same because hashenum() isn't broken in general,
just for this particular purpose. But I think you're trying to argue
that we should fix this by changing something other than the thing
that is broken, and I don't agree with that.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Geoghegan
Дата:
Сообщение: Re: pg_dump versus hash partitioning
Следующее
От: Tom Lane
Дата:
Сообщение: Re: pg_dump versus hash partitioning