RE: OK, OK, Hiroshi's right: use a seperately-generated filename

Поиск
Список
Период
Сортировка
От Hiroshi Inoue
Тема RE: OK, OK, Hiroshi's right: use a seperately-generated filename
Дата
Msg-id EKEJJICOHDIEMGPNIFIJIEAKCCAA.Inoue@tpf.co.jp
обсуждение исходный текст
Ответ на OK, OK, Hiroshi's right: use a seperately-generated filename  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: OK, OK, Hiroshi's right: use a seperately-generated filename  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
> -----Original Message-----
> From: pgsql-hackers-owner@hub.org [mailto:pgsql-hackers-owner@hub.org]On
> Behalf Of Tom Lane
> 
> After further thought I think there's a lot of merit in Hiroshi's
> opinion that physical file names should not be tied to relation OID.
> If we use a separately generated value for the file name, we can
> solve a lot of problems pretty nicely by means of "table versioning".
> 
> For example: VACUUM can't compact indexes at the moment, and what it
> does do (scan the index and delete unused entries) is really slow.
> The right thing to do is for it to generate an all-new index file,
> but how do we do that without creating a risk of leaving the index
> corrupted if we crash partway through?  The answer is to build the
> new index in a new physical file.  But how do we install the new
> file as the real index atomically, when it might span multiple
> segments?  If the physical file name is decoupled from the relation's
> name *and* OID then there is no problem: the atomic event that makes
> the new file(s) the real table contents is the commit of the new
> pg_class row with the new value for the physical filename.
> 
> Aside from possible improvements in VACUUM, this would let us do a
> robust implementation of CLUSTER, and we could do the "really change
> the table" variant of ALTER TABLE DROP COLUMN the same way if anyone
> wants to do it.
>

Yes,I've wondered how do we implement column_is_really_dropped 
ALTER TABLE DROP COLUMN feature without this kind of mechanism.

> The only cost is that we need an additional column in pg_class to
> hold the physical file name.  That's not so bad, especially when
> you remember that we'd surely need to add something to pg_class for
> tablespace support anyway.
> 
> If we bite that bullet, then we could also do something to satisfy
> Bruce about having legible file names ;-).  The column in pg_class
> could perfectly well be a string, not a pure number, and that means
> that we can throw in the relname (truncated to fit of course).  So
> the thing would act a lot like the original-relname-plus-OID variant
> that's been discussed so far.  (Original relname because ALTER TABLE
> RENAME would *not* change the physical file name.  But we could
> think about a form of VACUUM that creates a whole new table by
> versioning, and that would presumably bring the physical name back
> in sync with the logical relname.)
> 
> As Hiroshi pointed out, one of the best aspects of this approach
> is that the physical table layout policy doesn't have to be hard-wired
> into low-level file access routines.  The low-level routines don't
> need to know much of anything about the format of the pathname,
> they just stuff in the right segment number and use the name.  The
> layout policy need only be known to one single routine that generates
> the strings that go into pg_class.  So it'd be really easy to change.
>

Ross's approach is fundamentally same though he is using relname+OID
naming rule.  I've said his trial is most practical one.
> One thing we'd have to work out is that the critical system tables
> (eg, pg_class itself, as well as its indexes) would have to have
> predictable physical names.

The only limitation of the relation filename is the uniqueness.
So it doesn't introduce any inconsistency that system tables
have fixed name.
As for system relations it wouldn't be so bad because CLUSTER/
ALTER TABLE DROP COLUMN ... would be unnecessary(maybe).
But as for system indexes,it is preferable that VACUUM/REINDEX
could rebuild them safely. System indexes never shrink currently.

Regards.

Hiroshi Inoue
Inoue@tpf.co.jp


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: OK, OK, Hiroshi's right: use a seperately-generated filename
Следующее
От: "Hiroshi Inoue"
Дата:
Сообщение: RE: Big 7.1 open items