Re: Online enabling of checksums

Поиск
Список
Период
Сортировка
От Magnus Hagander
Тема Re: Online enabling of checksums
Дата
Msg-id CABUevEzMuHn6Hc2GeCrjcefxXTnwdMb0Fg7zPkMCH-EArA5suA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Online enabling of checksums  (Andres Freund <andres@anarazel.de>)
Ответы Re: Online enabling of checksums  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
On Sat, Feb 24, 2018 at 10:49 PM, Andres Freund <andres@anarazel.de> wrote:
On 2018-02-24 22:45:09 +0100, Magnus Hagander wrote:
> Is it really that invisible? Given how much we argue over adding single
> counters to the stats system, I'm not sure it's quite that low.

That's appears to be entirely unrelated. The stats stuff is expensive
because we currently have to essentialy write out the stats for *all*
tables in a database, once a counter is updated. And those counters are
obviously constantly updated. Thus the overhead of adding one column is
essentially multiplied by the number of tables in the system. Whereas
here it's a single column that can be updated on a per-row basis, which
is barely ever going to be written to.

Am I missing something?

It's probably at least partially unrelated, you are right. I may have misread our reluctance to add more values there as a general reluctancy to add more values to central columns.  

 
> We did consider doing it at a per-table basis as well. But this is also an
> overhead that has to be paid forever, whereas the risk of having to read
> the database files more than once (because it'd only have to read them on
> the second pass, not write anything) is a one-off operation. And for all
> those that have initialized with checksums in the first place don't have to
> pay any overhead at all in the current design.

Why does it have to be paid forever?

The size of the pg_class row would be there forever. Granted, it's not that big an overhead given that there are already plenty of columns there. But the point being you can never remove that column, and it will be there for users who never even considered running without checksums. It's certainly not a large overhead, but it's also not zero.


> I very strongly doubg it's a "very noticeable operational problem". People
> don't restart their databases very often... Let's say it takes 2-3 weeks to
> complete a run in a fairly large database. How many such large databases
> actually restart that frequently? I'm not sure I know of any. And the only
> effect of it is you have to start the process over (but read-only for the
> part you have already done). It's certainly not ideal, but I don't agree
> it's in any form a "very noticeable problem".

I definitely know large databases that fail over more frequently than
that.

I would argue that they have bigger issues than enabling checksums... By far. 

--

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: Online enabling of checksums
Следующее
От: Tomas Vondra
Дата:
Сообщение: Re: [HACKERS] PATCH: multivariate histograms and MCV lists