Обсуждение: COPY FREEZE and PD_ALL_VISIBLE

Поиск
Список
Период
Сортировка

COPY FREEZE and PD_ALL_VISIBLE

От
Jeff Janes
Дата:
I'm planning on adding a todo item to have COPY FREEZE set PD_ALL_VISIBLE.  Or is there some reason this can't be done?

Since the whole point of COPY FREEZE is to avoid needing to rewrite the entire table, it seems rather perverse that the first time the table is vacuumed, it needs to rewrite the entire table.


Cheers,

Jeff

Re: COPY FREEZE and PD_ALL_VISIBLE

От
Robert Haas
Дата:
On Sun, Oct 18, 2015 at 5:23 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> I'm planning on adding a todo item to have COPY FREEZE set PD_ALL_VISIBLE.
> Or is there some reason this can't be done?
>
> Since the whole point of COPY FREEZE is to avoid needing to rewrite the
> entire table, it seems rather perverse that the first time the table is
> vacuumed, it needs to rewrite the entire table.

*facepalm*

I don't know how hard that is to implement, but +1 for trying to
figure out a way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: COPY FREEZE and PD_ALL_VISIBLE

От
Jeff Janes
Дата:
On Tue, Oct 20, 2015 at 7:02 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Sun, Oct 18, 2015 at 5:23 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> I'm planning on adding a todo item to have COPY FREEZE set PD_ALL_VISIBLE.
> Or is there some reason this can't be done?
>
> Since the whole point of COPY FREEZE is to avoid needing to rewrite the
> entire table, it seems rather perverse that the first time the table is
> vacuumed, it needs to rewrite the entire table.

*facepalm*

I don't know how hard that is to implement, but +1 for trying to
figure out a way.


It turns out it was pretty easy to set PD_ALL_VISIBLE on the new pages, since the code in hio that requests the relation to be extended already has info on the tuple's intended freeze status.

Then you just need to refrain from clearing PD_ALL_VISIBLE when that tuple is actually written into the page.  Not only because clearing would defeat the purpose, but also because it will cause an error--apparently the incipient page is not yet in a state where visibilitymap_clear is willing to deal with it.

With this patch, you get a table which has PD_ALL_VISIBLE set for all pages, but which doesn't have a _vm file. Index-only scans will visit the heap for each tuple until the first VACUUM is done.

The first vacuum will read the entire table, but not need to write it anymore.  And will create the _vm file.

I think we really want to create _vm file as well as set PD_ALL_VISIBLE, but I don't know the best way to do that.  Set a flag somewhere and then create it in bulk at the end of the transaction?  Set it bit by bit as the pages are extended and initialized?
 
Cheers,

Jeff
Вложения

Re: COPY FREEZE and PD_ALL_VISIBLE

От
Robert Haas
Дата:
On Wed, Oct 21, 2015 at 1:31 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> It turns out it was pretty easy to set PD_ALL_VISIBLE on the new pages,
> since the code in hio that requests the relation to be extended already has
> info on the tuple's intended freeze status.
>
> Then you just need to refrain from clearing PD_ALL_VISIBLE when that tuple
> is actually written into the page.  Not only because clearing would defeat
> the purpose, but also because it will cause an error--apparently the
> incipient page is not yet in a state where visibilitymap_clear is willing to
> deal with it.

Wouldn't it be better to instead fill the page with tuples first and
THEN set PD_ALL_VISIBLE?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: COPY FREEZE and PD_ALL_VISIBLE

От
Simon Riggs
Дата:
On 18 October 2015 at 17:23, Jeff Janes <jeff.janes@gmail.com> wrote:
I'm planning on adding a todo item to have COPY FREEZE set PD_ALL_VISIBLE.  Or is there some reason this can't be done?

Since the whole point of COPY FREEZE is to avoid needing to rewrite the entire table, it seems rather perverse that the first time the table is vacuumed, it needs to rewrite the entire table.

That's pretty darn weird. I remember measuring the benefit during testing. I remember Jeff D was talking about removing that flag at that time, so perhaps that's it.

Either way, my bug, my bad, thanks for the report.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: COPY FREEZE and PD_ALL_VISIBLE

От
Simon Riggs
Дата:
On 21 October 2015 at 13:31, Jeff Janes <jeff.janes@gmail.com> wrote:
On Tue, Oct 20, 2015 at 7:02 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Sun, Oct 18, 2015 at 5:23 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> I'm planning on adding a todo item to have COPY FREEZE set PD_ALL_VISIBLE.
> Or is there some reason this can't be done?
>
> Since the whole point of COPY FREEZE is to avoid needing to rewrite the
> entire table, it seems rather perverse that the first time the table is
> vacuumed, it needs to rewrite the entire table.

*facepalm*

I don't know how hard that is to implement, but +1 for trying to
figure out a way.


It turns out it was pretty easy to set PD_ALL_VISIBLE on the new pages, since the code in hio that requests the relation to be extended already has info on the tuple's intended freeze status.

Then you just need to refrain from clearing PD_ALL_VISIBLE when that tuple is actually written into the page.  Not only because clearing would defeat the purpose, but also because it will cause an error--apparently the incipient page is not yet in a state where visibilitymap_clear is willing to deal with it.

With this patch, you get a table which has PD_ALL_VISIBLE set for all pages, but which doesn't have a _vm file.

Patch is simple enough. All usage looks safe, so I reckon this is good.
 

Index-only scans will visit the heap for each tuple until the first VACUUM is done.

The first vacuum will read the entire table, but not need to write it anymore.  And will create the _vm file.

I think we really want to create _vm file as well as set PD_ALL_VISIBLE, but I don't know the best way to do that.  Set a flag somewhere and then create it in bulk at the end of the transaction?  Set it bit by bit as the pages are extended and initialized?

Easy enough to do it at the end of the COPY FREEZE in one step. No need to wait until EOXact.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: COPY FREEZE and PD_ALL_VISIBLE

От
Amit Kapila
Дата:
On Fri, Oct 23, 2015 at 6:29 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
On 21 October 2015 at 13:31, Jeff Janes <jeff.janes@gmail.com> wrote:

Index-only scans will visit the heap for each tuple until the first VACUUM is done.

The first vacuum will read the entire table, but not need to write it anymore.  And will create the _vm file.

I think we really want to create _vm file as well as set PD_ALL_VISIBLE, but I don't know the best way to do that.  Set a flag somewhere and then create it in bulk at the end of the transaction?  Set it bit by bit as the pages are extended and initialized?

Easy enough to do it at the end of the COPY FREEZE in one step.

Here, we might want to consider that setting bit in visibility map
will generate WAL log whereas Copy Freeze otherwise skip WAL
when wal_level is less than archive.  This can lead to extra disk
writes which can slow down Copy Freeze, but OTOH that might
be acceptable.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: COPY FREEZE and PD_ALL_VISIBLE

От
Amit Kapila
Дата:
On Fri, Oct 23, 2015 at 2:46 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Wed, Oct 21, 2015 at 1:31 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> > It turns out it was pretty easy to set PD_ALL_VISIBLE on the new pages,
> > since the code in hio that requests the relation to be extended already has
> > info on the tuple's intended freeze status.
> >
> > Then you just need to refrain from clearing PD_ALL_VISIBLE when that tuple
> > is actually written into the page.  Not only because clearing would defeat
> > the purpose, but also because it will cause an error--apparently the
> > incipient page is not yet in a state where visibilitymap_clear is willing to
> > deal with it.
>
> Wouldn't it be better to instead fill the page with tuples first and
> THEN set PD_ALL_VISIBLE?
>

Ideally that would be better and if we want to do that way, then I think it
needs to be done at end of Copy Freeze operation, also for that we
need to mark the buffer as dirty again (do you see any other way
of achieving the same?).  OTOH, I don't see any harm even if we do it
in the way as done in patch.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: COPY FREEZE and PD_ALL_VISIBLE

От
Simon Riggs
Дата:
On 3 November 2015 at 15:23, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Oct 23, 2015 at 6:29 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
On 21 October 2015 at 13:31, Jeff Janes <jeff.janes@gmail.com> wrote:

Index-only scans will visit the heap for each tuple until the first VACUUM is done.

The first vacuum will read the entire table, but not need to write it anymore.  And will create the _vm file.

I think we really want to create _vm file as well as set PD_ALL_VISIBLE, but I don't know the best way to do that.  Set a flag somewhere and then create it in bulk at the end of the transaction?  Set it bit by bit as the pages are extended and initialized?

Easy enough to do it at the end of the COPY FREEZE in one step.

Here, we might want to consider that setting bit in visibility map
will generate WAL log whereas Copy Freeze otherwise skip WAL
when wal_level is less than archive.  This can lead to extra disk
writes which can slow down Copy Freeze, but OTOH that might
be acceptable.

I'm building the map as I go, in the latest version of this patch I'm working on. 

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: COPY FREEZE and PD_ALL_VISIBLE

От
Amit Kapila
Дата:
On Tue, Nov 3, 2015 at 8:07 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>
> On 3 November 2015 at 15:23, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>
>> On Fri, Oct 23, 2015 at 6:29 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>>
>>> Easy enough to do it at the end of the COPY FREEZE in one step.
>>
>>
>> Here, we might want to consider that setting bit in visibility map
>> will generate WAL log whereas Copy Freeze otherwise skip WAL
>> when wal_level is less than archive.  This can lead to extra disk
>> writes which can slow down Copy Freeze, but OTOH that might
>> be acceptable.
>
>
> I'm building the map as I go, in the latest version of this patch I'm working on.
>

As, you are working on this patch, I have marked it as "Waiting on Author".

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: COPY FREEZE and PD_ALL_VISIBLE

От
Michael Paquier
Дата:
On Wed, Nov 4, 2015 at 12:08 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Tue, Nov 3, 2015 at 8:07 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>
>> On 3 November 2015 at 15:23, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>>
>>> On Fri, Oct 23, 2015 at 6:29 AM, Simon Riggs <simon@2ndquadrant.com>
>>> wrote:
>>>>
>>>> Easy enough to do it at the end of the COPY FREEZE in one step.
>>>
>>>
>>> Here, we might want to consider that setting bit in visibility map
>>> will generate WAL log whereas Copy Freeze otherwise skip WAL
>>> when wal_level is less than archive.  This can lead to extra disk
>>> writes which can slow down Copy Freeze, but OTOH that might
>>> be acceptable.
>>
>>
>> I'm building the map as I go, in the latest version of this patch I'm
>> working on.
>>
>
> As, you are working on this patch, I have marked it as "Waiting on Author".

There has not been an update for this patch in a while, switched to
"returned with feedback" because of lack of author's input.
-- 
Michael



Re: COPY FREEZE and PD_ALL_VISIBLE

От
Jeff Janes
Дата:
On Wed, Dec 23, 2015 at 6:14 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Wed, Nov 4, 2015 at 12:08 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> On Tue, Nov 3, 2015 at 8:07 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>>
>>> On 3 November 2015 at 15:23, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>>>
>>>> On Fri, Oct 23, 2015 at 6:29 AM, Simon Riggs <simon@2ndquadrant.com>
>>>> wrote:
>>>>>
>>>>> Easy enough to do it at the end of the COPY FREEZE in one step.
>>>>
>>>>
>>>> Here, we might want to consider that setting bit in visibility map
>>>> will generate WAL log whereas Copy Freeze otherwise skip WAL
>>>> when wal_level is less than archive.  This can lead to extra disk
>>>> writes which can slow down Copy Freeze, but OTOH that might
>>>> be acceptable.
>>>
>>>
>>> I'm building the map as I go, in the latest version of this patch I'm
>>> working on.
>>>
>>
>> As, you are working on this patch, I have marked it as "Waiting on Author".
>
> There has not been an update for this patch in a while, switched to
> "returned with feedback" because of lack of author's input.

And I've added Simon as an author, as I don't plan on picking it up
again myself as long as he is working on his alternative.

Thanks,

Jeff



Re: COPY FREEZE and PD_ALL_VISIBLE

От
Jeff Janes
Дата:
On Tue, Nov 3, 2015 at 6:37 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 3 November 2015 at 15:23, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>
>> On Fri, Oct 23, 2015 at 6:29 AM, Simon Riggs <simon@2ndquadrant.com>
>> wrote:
>>>
>>> On 21 October 2015 at 13:31, Jeff Janes <jeff.janes@gmail.com> wrote:
>>>
>>>> Index-only scans will visit the heap for each tuple until the first
>>>> VACUUM is done.
>>>>
>>>> The first vacuum will read the entire table, but not need to write it
>>>> anymore.  And will create the _vm file.
>>>>
>>>> I think we really want to create _vm file as well as set PD_ALL_VISIBLE,
>>>> but I don't know the best way to do that.  Set a flag somewhere and then
>>>> create it in bulk at the end of the transaction?  Set it bit by bit as the
>>>> pages are extended and initialized?
>>>
>>>
>>> Easy enough to do it at the end of the COPY FREEZE in one step.
>>
>>
>> Here, we might want to consider that setting bit in visibility map
>> will generate WAL log whereas Copy Freeze otherwise skip WAL
>> when wal_level is less than archive.  This can lead to extra disk
>> writes which can slow down Copy Freeze, but OTOH that might
>> be acceptable.
>
>
> I'm building the map as I go, in the latest version of this patch I'm
> working on.

Hi Simon,

Is this still on your radar?  If you would like someone else to pick
it up, can you post the WIP patch you have?

Thanks,

Jeff



Re: [HACKERS] COPY FREEZE and PD_ALL_VISIBLE

От
Darafei "Komяpa" Praliaskouski
Дата:
Hello,

Today I bumped into need to limit first VACUUM time on data import.
I'm using utility called osmium together with COPY FREEZE to import openstreetmap data into database.

osmium export -c osmium.config -f pg belarus-latest.osm.pbf  -v --progress | psql -1 -c 'create table byosm(geom geometry, osm_type text, osm_id bigint, tags jsonb);copy byosm from stdin freeze;'

However, first pass of VACUUM rewrites the whole table. Here is two logs of VACUUM VERBOSE in a row:


In Russian Postgres Telegram group I've been recommended this thread. 
Can the patch be revived? What is needed to get it up for 12?

On Sun, Aug 14, 2016 at 10:37 PM Jeff Janes <jeff.janes@gmail.com> wrote:
On Tue, Nov 3, 2015 at 6:37 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 3 November 2015 at 15:23, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>
>> On Fri, Oct 23, 2015 at 6:29 AM, Simon Riggs <simon@2ndquadrant.com>
>> wrote:
>>>
>>> On 21 October 2015 at 13:31, Jeff Janes <jeff.janes@gmail.com> wrote:
>>>
>>>> Index-only scans will visit the heap for each tuple until the first
>>>> VACUUM is done.
>>>>
>>>> The first vacuum will read the entire table, but not need to write it
>>>> anymore.  And will create the _vm file.
>>>>
>>>> I think we really want to create _vm file as well as set PD_ALL_VISIBLE,
>>>> but I don't know the best way to do that.  Set a flag somewhere and then
>>>> create it in bulk at the end of the transaction?  Set it bit by bit as the
>>>> pages are extended and initialized?
>>>
>>>
>>> Easy enough to do it at the end of the COPY FREEZE in one step.
>>
>>
>> Here, we might want to consider that setting bit in visibility map
>> will generate WAL log whereas Copy Freeze otherwise skip WAL
>> when wal_level is less than archive.  This can lead to extra disk
>> writes which can slow down Copy Freeze, but OTOH that might
>> be acceptable.
>
>
> I'm building the map as I go, in the latest version of this patch I'm
> working on.

Hi Simon,

Is this still on your radar?  If you would like someone else to pick
it up, can you post the WIP patch you have?

Thanks,

Jeff


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


--
Darafei Praliaskouski

Re: [HACKERS] COPY FREEZE and PD_ALL_VISIBLE

От
Pavan Deolasee
Дата:

On Tue, Jan 15, 2019 at 8:48 PM Darafei "Komяpa" Praliaskouski <me@komzpa.net> wrote:
Hello,

Today I bumped into need to limit first VACUUM time on data import.
I'm using utility called osmium together with COPY FREEZE to import openstreetmap data into database.

osmium export -c osmium.config -f pg belarus-latest.osm.pbf  -v --progress | psql -1 -c 'create table byosm(geom geometry, osm_type text, osm_id bigint, tags jsonb);copy byosm from stdin freeze;'

However, first pass of VACUUM rewrites the whole table. Here is two logs of VACUUM VERBOSE in a row:


In Russian Postgres Telegram group I've been recommended this thread. 
Can the patch be revived? What is needed to get it up for 12?

I posted a new patch [1] for consideration to include in PG12. I started a new thread because the patch is completely new and this thread was a bit too old.

Thanks,
Pavan


--
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services