Обсуждение: reclaiming space from heavily used tables?
Hi, I've got a large table from which I'd like to completely reclaim space. I read the docs and it sounds like I can’t run `vacuumfull`, because this table is accessed constantly and can’t have downtime. Assuming that’s true, what do you thinkof the following idea? Is there a better alternative? 1. Replace the table T with two tables T1 and T2 and a view T that is `select * from T1 union T2`. 2. Write only to T1, and slowly move records from T2 to T1. 3. When T2 is empty, redefine the view T to be simply `select * from T1`. 4. Recreate or vacuum full T2, so space is fully reclaimed. 5. Redefine view as the union select and repeat process going other direction from T1 to T2, as needed. I guess one question is whether I can do 1 and 3 (renaming tables, redefining views) atomically without disturbing concurrentselect queries. Rob
On Mon, Dec 18, 2017 at 1:03 PM, Rob Nikander <rob.nikander@gmail.com> wrote:
Hi,
I've got a large table from which I'd like to completely reclaim space. I read the docs and it sounds like I can’t run `vacuum full`, because this table is accessed constantly and can’t have downtime. Assuming that’s true, what do you think of the following idea? Is there a better alternative?
1. Replace the table T with two tables T1 and T2 and a view T that is `select * from T1 union T2`.
2. Write only to T1, and slowly move records from T2 to T1.
3. When T2 is empty, redefine the view T to be simply `select * from T1`.
4. Recreate or vacuum full T2, so space is fully reclaimed.
5. Redefine view as the union select and repeat process going other direction from T1 to T2, as needed.
I guess one question is whether I can do 1 and 3 (renaming tables, redefining views) atomically without disturbing concurrent select queries.
This is what you want: https://github.com/reorg/pg_repack
This has been around for many years and is a very trusted extension (when will it be in core????).
Rob
Jeremy Finzel wrote: > This is what you want: https://github.com/reorg/pg_repack > > This has been around for many years and is a very trusted extension (when > will it be in core????). It's been around, but is it trusted? I for one do not trust it. See for example https://www.postgresql.org/message-id/CA+TgmoaWoU+BpBG4nwz1LGkmdsbL6_Hm9r8jNqte-5VRDLsrkQ@mail.gmail.com -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
It's been around, but is it trusted? I for one do not trust it. See
for example
https://www.postgresql.org/message-id/CA+TgmoaWoU+BpBG4nwz1L GkmdsbL6_Hm9r8jNqte-5VRDLsrkQ@ mail.gmail.com
Needs some discussion. Has anyone actually reported corruption related to this? I don't doubt the opinion, but I do wonder why I have never seen reports of corruption related to this when people have readily reported such things in years past about pg_repack. If it (apparently) works so well in practice for so many companies, but has this bug, why not leverage what they have done and fix its bugs when it serves such an important function? The extension serves a huge need on many fronts.
Jeremy Finzel wrote: > > > > It's been around, but is it trusted? I for one do not trust it. See > > for example > > https://www.postgresql.org/message-id/CA+TgmoaWoU+BpBG4nwz1L > > GkmdsbL6_Hm9r8jNqte-5VRDLsrkQ@mail.gmail.com > > Needs some discussion. Has anyone actually reported corruption related to > this? I don't doubt the opinion, but I do wonder why I have never seen > reports of corruption related to this when people have readily reported > such things in years past about pg_repack. Maybe the nature of the corruption caused is different. It took months of running large databases on production for corruption to become apparent from multixact bugs, for example. Or maybe because the relfrozenxid is fixed by other activity in the system, any bugs are masked -- but that you could get in trouble if autovacuum is disabled, perhaps. Or maybe it's masked even in that case, since autovacuum observes that the table has a very old frozenxid and runs a forced vacuum on that table anyway. > If it (apparently) works so well in practice for so many companies, > but has this bug, why not leverage what they have done [...] ? What have they done? > [...] and fix its bugs when it serves such an important function? The > extension serves a huge need on many fronts. It's not PGDG's tool, so these are not "our" bugs to fix. Though it's pretty clear that we need *something* to cover those needs ... but nothing has been submitted to commitfests. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Maybe the nature of the corruption caused is different. It took months
of running large databases on production for corruption to become
apparent from multixact bugs, for example. Or maybe because the
relfrozenxid is fixed by other activity in the system, any bugs are
masked -- but that you could get in trouble if autovacuum is disabled,
perhaps. Or maybe it's masked even in that case, since autovacuum
observes that the table has a very old frozenxid and runs a forced
vacuum on that table anyway.
Or maybe there isn’t actual corruption. There is debate in the Git issue discussed over this... I’m not sure. But whatever the case may be, I don’t find it useful to throw out a tool out of hand because of what in theory appears to be a huge bug but which in reality has not happened to a very broad install base of users. pg_repack appears to be the number one way people solve this kind of problem in very critical production environments. So I don’t find it reasonable to just throw out the tool.
> If it (apparently) works so well in practice for so many companies,
> but has this bug, why not leverage what they have done [...] ?
What have they done?
Made a tool that can cluster tables, remove bloat and reindex, and move tables between tablespaces with no downtime.
> [...] and fix its bugs when it serves such an important function? The
> extension serves a huge need on many fronts.
It's not PGDG's tool, so these are not "our" bugs to fix. Though it's
pretty clear that we need *something* to cover those needs ... but
nothing has been submitted to commitfests.
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
I’m not trying to point fingers or say who should do what. My point is that this tool solves some important production challenges and it would be better to leverage this code base and maybe fix some issues then try to write something completely new. And that is largely because there has already been huge production usage of this tool for years which is already exposed many issues that have been fixed.
Many thanks for the feedback,
Jeremy
I don’t see how I can use that extension. If some people are saying it’s dangerous and others are saying it’s fine, I don’thave the time to drill down into PG internals so that I can judge for myself. I’ll probably try the two table idea outlinedin my original message. > On Dec 18, 2017, at 5:24 PM, Jeremy Finzel <finzelj@gmail.com> wrote: > […] >> On Dec 18, 2017, at 2:51 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: >> […]