Обсуждение: reclaiming space from heavily used tables?

Поиск
Список
Период
Сортировка

reclaiming space from heavily used tables?

От
Rob Nikander
Дата:
Hi,

I've got a large table from which I'd like to completely reclaim space. I read the docs and it sounds like I can’t run
`vacuumfull`, because this table is accessed constantly and can’t have downtime. Assuming that’s true, what do you
thinkof the following idea? Is there a better alternative? 

1. Replace the table T with two tables T1 and T2 and a view T that is `select * from T1 union T2`.
2. Write only to T1, and slowly move records from T2 to T1.
3. When T2 is empty, redefine the view T to be simply `select * from T1`.
4. Recreate or vacuum full T2, so space is fully reclaimed.
5. Redefine view as the union select and repeat process going other direction from T1 to T2, as needed.

I guess one question is whether I can do 1 and 3 (renaming tables, redefining views) atomically without disturbing
concurrentselect queries. 

Rob



Re: reclaiming space from heavily used tables?

От
Jeremy Finzel
Дата:


On Mon, Dec 18, 2017 at 1:03 PM, Rob Nikander <rob.nikander@gmail.com> wrote:
Hi,

I've got a large table from which I'd like to completely reclaim space. I read the docs and it sounds like I can’t run `vacuum full`, because this table is accessed constantly and can’t have downtime. Assuming that’s true, what do you think of the following idea? Is there a better alternative?

1. Replace the table T with two tables T1 and T2 and a view T that is `select * from T1 union T2`.
2. Write only to T1, and slowly move records from T2 to T1.
3. When T2 is empty, redefine the view T to be simply `select * from T1`.
4. Recreate or vacuum full T2, so space is fully reclaimed.
5. Redefine view as the union select and repeat process going other direction from T1 to T2, as needed.

I guess one question is whether I can do 1 and 3 (renaming tables, redefining views) atomically without disturbing concurrent select queries.


This is what you want: https://github.com/reorg/pg_repack

This has been around for many years and is a very trusted extension (when will it be in core????).
 
Rob



Re: reclaiming space from heavily used tables?

От
Alvaro Herrera
Дата:
Jeremy Finzel wrote:

> This is what you want: https://github.com/reorg/pg_repack
> 
> This has been around for many years and is a very trusted extension (when
> will it be in core????).

It's been around, but is it trusted?  I for one do not trust it.  See
for example
https://www.postgresql.org/message-id/CA+TgmoaWoU+BpBG4nwz1LGkmdsbL6_Hm9r8jNqte-5VRDLsrkQ@mail.gmail.com

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: reclaiming space from heavily used tables?

От
Jeremy Finzel
Дата:
It's been around, but is it trusted?  I for one do not trust it.  See
for example
https://www.postgresql.org/message-id/CA+TgmoaWoU+BpBG4nwz1LGkmdsbL6_Hm9r8jNqte-5VRDLsrkQ@mail.gmail.com


Needs some discussion.  Has anyone actually reported corruption related to this?  I don't doubt the opinion, but I do wonder why I have never seen reports of corruption related to this when people have readily reported such things in years past about pg_repack.  If it (apparently) works so well in practice for so many companies, but has this bug, why not leverage what they have done and fix its bugs when it serves such an important function?  The extension serves a huge need on many fronts.

Re: reclaiming space from heavily used tables?

От
Alvaro Herrera
Дата:
Jeremy Finzel wrote:
> >
> > It's been around, but is it trusted?  I for one do not trust it.  See
> > for example
> > https://www.postgresql.org/message-id/CA+TgmoaWoU+BpBG4nwz1L
> > GkmdsbL6_Hm9r8jNqte-5VRDLsrkQ@mail.gmail.com
>
> Needs some discussion.  Has anyone actually reported corruption related to
> this?  I don't doubt the opinion, but I do wonder why I have never seen
> reports of corruption related to this when people have readily reported
> such things in years past about pg_repack.

Maybe the nature of the corruption caused is different.  It took months
of running large databases on production for corruption to become
apparent from multixact bugs, for example.  Or maybe because the
relfrozenxid is fixed by other activity in the system, any bugs are
masked -- but that you could get in trouble if autovacuum is disabled,
perhaps.  Or maybe it's masked even in that case, since autovacuum
observes that the table has a very old frozenxid and runs a forced
vacuum on that table anyway.

> If it (apparently) works so well in practice for so many companies,
> but has this bug, why not leverage what they have done [...] ?

What have they done?

> [...] and fix its bugs when it serves such an important function?  The
> extension serves a huge need on many fronts.

It's not PGDG's tool, so these are not "our" bugs to fix.  Though it's
pretty clear that we need *something* to cover those needs ... but
nothing has been submitted to commitfests.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: reclaiming space from heavily used tables?

От
Jeremy Finzel
Дата:
Maybe the nature of the corruption caused is different.  It took months
of running large databases on production for corruption to become
apparent from multixact bugs, for example.  Or maybe because the
relfrozenxid is fixed by other activity in the system, any bugs are
masked -- but that you could get in trouble if autovacuum is disabled,
perhaps.  Or maybe it's masked even in that case, since autovacuum
observes that the table has a very old frozenxid and runs a forced
vacuum on that table anyway.

Or maybe there isn’t actual corruption. There is debate in the Git issue discussed over this... I’m not sure.  But whatever the case may be, I don’t find it useful to throw out a tool out of hand because of what in theory appears to be a huge bug but which in reality has not happened to a very broad install base of users. pg_repack appears to be the number one way people solve this kind of problem in very critical production environments. So I don’t find it reasonable to just throw out the tool.

> If it (apparently) works so well in practice for so many companies,
> but has this bug, why not leverage what they have done [...] ?

What have they done?

Made a tool that can cluster tables, remove bloat and reindex, and move tables between tablespaces with no downtime.


> [...] and fix its bugs when it serves such an important function?  The
> extension serves a huge need on many fronts.

It's not PGDG's tool, so these are not "our" bugs to fix.  Though it's
pretty clear that we need *something* to cover those needs ... but
nothing has been submitted to commitfests.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

I’m not trying to point fingers or say who should do what.  My point is that this tool solves some important production challenges and it would be better to leverage this code base and maybe fix some issues then try to write something completely new. And that is largely because there has already been huge production usage of this tool for years which is already exposed many issues that have been fixed.
Many thanks for the feedback,
Jeremy 

Re: reclaiming space from heavily used tables?

От
Rob Nikander
Дата:
I don’t see how I can use that extension. If some people are saying it’s dangerous and others are saying it’s fine, I
don’thave the time to drill down into PG internals so that I can judge for myself. I’ll probably try the two table idea
outlinedin my original message. 

> On Dec 18, 2017, at 5:24 PM, Jeremy Finzel <finzelj@gmail.com> wrote:
> […]
>> On Dec 18, 2017, at 2:51 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>> […]