Обсуждение: tuptoaster.c must *not* use SnapshotAny

Поиск
Список
Период
Сортировка

tuptoaster.c must *not* use SnapshotAny

От
Tom Lane
Дата:
The tuple toaster currently fetches toast-table rows using SnapshotAny.
This is quite uncool, because it will in fact find *any* row.  Including
dead rows resulting from an aborted (or, in 7.2, still-in-progress)
VACUUM.

I believe this is the explanation for a problem report I'm currently
chasing from Joshua Drake:

digivision2=# select * from change_log;
ERROR:  chunk 1 for toast value 3388774 appears multiple times

(there are some thousands of rows with similar problems)

I suggest SnapshotSelf instead.  Comments anyone?
        regards, tom lane


Re: tuptoaster.c must *not* use SnapshotAny

От
"Command Prompt, Inc."
Дата:
Josh Drake here: Although I know that development has pretty much ceased
on the 7.1 series. This bug, is potentially very serious for those who
have the problem.

The last time this came up (that I can find) is in August of 2001, so the
bug is not frequent that we are aware of. However, as it appears the fix
is simple should we apply a 7.1.3.1 or 7.1.4 that fixes this issue?


J


> On Tue, 15 Jan 2002, Tom Lane wrote:

> The tuple toaster currently fetches toast-table rows using SnapshotAny.
> This is quite uncool, because it will in fact find *any* row.  Including
> dead rows resulting from an aborted (or, in 7.2, still-in-progress)
> VACUUM.
>
> I believe this is the explanation for a problem report I'm currently
> chasing from Joshua Drake:
>
> digivision2=# select * from change_log;
> ERROR:  chunk 1 for toast value 3388774 appears multiple times
>
> (there are some thousands of rows with similar problems)
>
> I suggest SnapshotSelf instead.  Comments anyone?
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly
>

-- 
by way of pgsql-hackers@commandprompt.com



Re: tuptoaster.c must *not* use SnapshotAny

От
Tom Lane
Дата:
"Command Prompt, Inc." <pgsql-hackers@commandprompt.com> writes:
> Josh Drake here: Although I know that development has pretty much ceased
> on the 7.1 series. This bug, is potentially very serious for those who
> have the problem.

Well, based on our experiments last night, recovery is pretty simple:
vacuum the affected table.  So it doesn't qualify as a critical problem
in my eyes, although surely it's a "must fix" for 7.2.

> The last time this came up (that I can find) is in August of 2001, so the
> bug is not frequent that we are aware of. However, as it appears the fix
> is simple should we apply a 7.1.3.1 or 7.1.4 that fixes this issue?

I don't think anyone has the energy to put out a 7.1.4.  We seem to have
enough trouble getting 7.2 out the door.
        regards, tom lane


Re: tuptoaster.c must *not* use SnapshotAny

От
"Zeugswetter Andreas SB SD"
Дата:
> The tuple toaster currently fetches toast-table rows using SnapshotAny.
> This is quite uncool, because it will in fact find *any* row.

Well, doesn't it need to find *any* row, since the accesses to heap and toast
are not "atomic" (not interlocked) ?

>  Including dead rows resulting from an aborted

But then also the heap tuple is aborted, and thus a normal select will not
select that toast anyway.

> (or, in 7.2, still-in-progress)
> VACUUM.

I thought new vacuum did not move tuples ? Why should it then produce 
two rows ? And VACUUM FULL locks the table so ...

I think SnapshotSelf could have concurrency problems (not see a toast
that someone else committed between our heap read and toast read)
IIRC, at the time at least SnapshotAny was chosen on purpose.

Might it be a problem, that VACUUM sets xids to FrozenXid, and we always
consider that one Valid, even if that vacuum is still in progress ? 

I can only see a problem with a partway through vacuum full, that aborted.

Andreas

PS: short desc between Any and Self, anybody ?


Re: tuptoaster.c must *not* use SnapshotAny

От
"Zeugswetter Andreas SB SD"
Дата:
> You're right, lazy VACUUM doesn't create this issue.  My mistake.
> But VACUUM FULL does.
> 
> > I can only see a problem with a partway through vacuum 
> full, that aborted.
> 
> Exactly.

Can you try give me a hint, why an aborted VACUUM FULL will not allways be 
a problem (also for other operations) until you run another VACUUM FULL
that succeeds ?

How do we know, that a (newly) FrozenXid tuple does not still have 
a (visible) duplicate ?

Andreas


Re: tuptoaster.c must *not* use SnapshotAny

От
Tom Lane
Дата:
"Zeugswetter Andreas SB SD" <ZeugswetterA@spardat.at> writes:
>> The tuple toaster currently fetches toast-table rows using SnapshotAny.
>> This is quite uncool, because it will in fact find *any* row.

> Well, doesn't it need to find *any* row, since the accesses to heap and toast
> are not "atomic" (not interlocked) ?

No; it needs to *not* find dead rows resulting from a failed vacuum.
You are right that for the most part TOAST relies on time qualification
of the main-table row, and no doubt that was why Jan thought he could
code it like this; but vacuuming the TOAST table is not an operation
that affects the main table.

>> Including dead rows resulting from an aborted

> But then also the heap tuple is aborted, and thus a normal select will not
> select that toast anyway.

Wrong; see above.  We are talking about reshuffling rows in the TOAST
table that belong (presumably) to a live row in the main table.  During
the VACUUM there will be more than one copy of such rows.

>> (or, in 7.2, still-in-progress)
>> VACUUM.

> I thought new vacuum did not move tuples ? Why should it then produce 
> two rows ? And VACUUM FULL locks the table so ...

You're right, lazy VACUUM doesn't create this issue.  My mistake.
But VACUUM FULL does.

> I can only see a problem with a partway through vacuum full, that aborted.

Exactly.
        regards, tom lane


Re: tuptoaster.c must *not* use SnapshotAny

От
Tom Lane
Дата:
"Zeugswetter Andreas SB SD" <ZeugswetterA@spardat.at> writes:
> Can you try give me a hint, why an aborted VACUUM FULL will not allways be 
> a problem (also for other operations) until you run another VACUUM FULL
> that succeeds ?

Because noplace else uses SnapshotAny, basically.  The only really
legitimate use I can see for it is in Tatsuo's contrib/pgstattuple
hack: that wants to count both visible and invisible tuples.

> How do we know, that a (newly) FrozenXid tuple does not still have 
> a (visible) duplicate ?

It's *not* visible, if you are applying any visibility checks whatever.
But SnapshotAny bypasses all visibility checking.

I forgot to answer your question about what these mean.  See 
src/backend/utils/time/tqual.c for the gory details, but if you
are willing to accept the summary comments in that file:

Any        No visibility check whatever

Self        Other committed xacts + effects of this xact up to    *and including* current command

Now        Other committed xacts + effects of this xact up to    *but not including* current command

Dirty        like Self, but can see effects of other in-progress    xacts too

Snapshot    like Now, but specified other xacts are considered    uncommitted even if they've actually committed by
now

There are also special visibility rules for Update and Vacuum, which
are not really different from the normal rules, they just return more
information than "visible" or "not visible".  (Dirty returns some
side information too.)

Hmm.  Now that I look at it, Self probably isn't quite right either;
if we are reading a main-table tuple that's committed dead but is
still visible according to our snapshot, we need to be able to see
its toast tuples too; but they're committed dead as well.  Sigh.
I think we need a special visibility rule for TOAST, that only
checks for HEAP_MOVED_IN/HEAP_MOVED_OFF (the bits that take care
of VACUUM moves).

Comments?
        regards, tom lane


Re: tuptoaster.c must *not* use SnapshotAny

От
Jan Wieck
Дата:
Tom Lane wrote:
> The tuple toaster currently fetches toast-table rows using SnapshotAny.
> This is quite uncool, because it will in fact find *any* row.  Including
> dead rows resulting from an aborted (or, in 7.2, still-in-progress)
> VACUUM.
   Really?  So  what  you're  saying  means that while vacuum is   running there's a window where one tuple can  exist
multiple  times  and  just  the  fact  that  other backends can see the   vacuum transaction alive prevents them from
showingup?  That   doesn't sound really good to me.
 


Jan

>
> I believe this is the explanation for a problem report I'm currently
> chasing from Joshua Drake:
>
> digivision2=# select * from change_log;
> ERROR:  chunk 1 for toast value 3388774 appears multiple times
>
> (there are some thousands of rows with similar problems)
>
> I suggest SnapshotSelf instead.  Comments anyone?
>
>              regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly
>


--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #



_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com



Re: tuptoaster.c must *not* use SnapshotAny

От
Tom Lane
Дата:
Jan Wieck <janwieck@yahoo.com> writes:
>     Really?  So  what  you're  saying  means that while vacuum is
>     running there's a window where one tuple can  exist  multiple
>     times

Moving a tuple in VACUUM isn't fundamentally different from an UPDATE.
You need a visibility rule to tell you which version of the tuple to
pay attention to.  Right now, TOAST hasn't got one.

Per later discussion, Andreas is right that SnapshotSelf won't do.
I think we need a new tqual.c routine just for TOAST.  (The alternative
would be for TOAST to know which snapshot was used to find the main-
table tuple, but that doesn't seem workable.)
        regards, tom lane


Re: tuptoaster.c must *not* use SnapshotAny

От
Bruce Momjian
Дата:
I have added more comments to tqual.c to the stuff Tom just added.  I
added visibility items to the top of each function comment.

---------------------------------------------------------------------------

Tom Lane wrote:
> "Zeugswetter Andreas SB SD" <ZeugswetterA@spardat.at> writes:
> > Can you try give me a hint, why an aborted VACUUM FULL will not allways be 
> > a problem (also for other operations) until you run another VACUUM FULL
> > that succeeds ?
> 
> Because noplace else uses SnapshotAny, basically.  The only really
> legitimate use I can see for it is in Tatsuo's contrib/pgstattuple
> hack: that wants to count both visible and invisible tuples.
> 
> > How do we know, that a (newly) FrozenXid tuple does not still have 
> > a (visible) duplicate ?
> 
> It's *not* visible, if you are applying any visibility checks whatever.
> But SnapshotAny bypasses all visibility checking.
> 
> I forgot to answer your question about what these mean.  See 
> src/backend/utils/time/tqual.c for the gory details, but if you
> are willing to accept the summary comments in that file:
> 
> Any        No visibility check whatever
> 
> Self        Other committed xacts + effects of this xact up to
>         *and including* current command
> 
> Now        Other committed xacts + effects of this xact up to
>         *but not including* current command
> 
> Dirty        like Self, but can see effects of other in-progress
>         xacts too
> 
> Snapshot    like Now, but specified other xacts are considered
>         uncommitted even if they've actually committed by now
> 
> There are also special visibility rules for Update and Vacuum, which
> are not really different from the normal rules, they just return more
> information than "visible" or "not visible".  (Dirty returns some
> side information too.)
> 
> Hmm.  Now that I look at it, Self probably isn't quite right either;
> if we are reading a main-table tuple that's committed dead but is
> still visible according to our snapshot, we need to be able to see
> its toast tuples too; but they're committed dead as well.  Sigh.
> I think we need a special visibility rule for TOAST, that only
> checks for HEAP_MOVED_IN/HEAP_MOVED_OFF (the bits that take care
> of VACUUM moves).
> 
> Comments?
> 
>             regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: tuptoaster.c must *not* use SnapshotAny

От
"Zeugswetter Andreas SB SD"
Дата:
> > How do we know, that a (newly) FrozenXid tuple does not still have 
> > a (visible) duplicate ?
> 
> It's *not* visible, if you are applying any visibility checks whatever.
> But SnapshotAny bypasses all visibility checking.

I am concerned about the case where VACUUM FULL:
1. inserts heap tuple to new location using FrozenXid
2. updates original heap tuples's xmax

What if we crash/abort between step 1 and 2 but we used FrozenXid for 1.
Don't know if we actually do this, but imho we are only allowed 
to use FrozenXid for an inplace vacuum operation.

Andreas


Re: tuptoaster.c must *not* use SnapshotAny

От
Tom Lane
Дата:
"Zeugswetter Andreas SB SD" <ZeugswetterA@spardat.at> writes:
> I am concerned about the case where VACUUM FULL:
> 1. inserts heap tuple to new location using FrozenXid
> 2. updates original heap tuples's xmax

It doesn't matter whether it's FrozenXid or not.  The tuple is not
visible if it's got the wrong setting of HEAP_MOVED_OFF/IN.
        regards, tom lane


Re: tuptoaster.c must *not* use SnapshotAny

От
Tom Lane
Дата:
"Zeugswetter Andreas SB SD" <ZeugswetterA@spardat.at> writes:
>> It doesn't matter whether it's FrozenXid or not.  The tuple is not
>> visible if it's got the wrong setting of HEAP_MOVED_OFF/IN.

> But the FrozenXid tuple has HEAP_MOVED_IN and the original has
> not yet been altered to HEAP_MOVED_OFF because of abort.
> Is the HEAP_MOVED_IN tuple not visible ?

Right.  Actually it doesn't matter whether the old tuple has
HEAP_MOVED_OFF or not; it's still visible *until* the VACUUM commits.
The commit atomically switches us from the state where the unmoved
tuples are good to the state where the moved ones are good.

This is all exactly the same whether FrozenXid is involved or not.
        regards, tom lane


Re: tuptoaster.c must *not* use SnapshotAny

От
"Zeugswetter Andreas SB SD"
Дата:
> > I am concerned about the case where VACUUM FULL:
> > 1. inserts heap tuple to new location using FrozenXid
> > 2. updates original heap tuples's xmax
> 
> It doesn't matter whether it's FrozenXid or not.  The tuple is not
> visible if it's got the wrong setting of HEAP_MOVED_OFF/IN.

But the FrozenXid tuple has HEAP_MOVED_IN and the original has
not yet been altered to HEAP_MOVED_OFF because of abort.
Is the HEAP_MOVED_IN tuple not visible ?

Andreas


Re: tuptoaster.c must *not* use SnapshotAny

От
Jan Wieck
Дата:
Tom Lane wrote:
> "Zeugswetter Andreas SB SD" <ZeugswetterA@spardat.at> writes:
> >> It doesn't matter whether it's FrozenXid or not.  The tuple is not
> >> visible if it's got the wrong setting of HEAP_MOVED_OFF/IN.
>
> > But the FrozenXid tuple has HEAP_MOVED_IN and the original has
> > not yet been altered to HEAP_MOVED_OFF because of abort.
> > Is the HEAP_MOVED_IN tuple not visible ?
>
> Right.  Actually it doesn't matter whether the old tuple has
> HEAP_MOVED_OFF or not; it's still visible *until* the VACUUM commits.
> The commit atomically switches us from the state where the unmoved
> tuples are good to the state where the moved ones are good.
>
> This is all exactly the same whether FrozenXid is involved or not.
   Originally I added SnapshotAny only to solve the problem that   the lookup of the toast table happens  independant
from the   access  to  the  main  tuple, and that in fact the main tuple   could have been deleted by the own
transactionin between.
 
   Would changing SnapshotAny  to  honor  HEAP_MOVED_*  fix  the   problem?  If nothing else by now uses this snapshot,
itcould   be done in place.
 


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #



_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com



Re: tuptoaster.c must *not* use SnapshotAny

От
Tom Lane
Дата:
Jan Wieck <janwieck@yahoo.com> writes:
>     Originally I added SnapshotAny only to solve the problem that
>     the lookup of the toast table happens  independant  from  the
>     access  to  the  main  tuple, and that in fact the main tuple
>     could have been deleted by the own transaction in between.

>     Would changing SnapshotAny  to  honor  HEAP_MOVED_*  fix  the
>     problem?  If nothing else by now uses this snapshot, it could
>     be done in place.

What I've done is add an additional snapshot class SnapshotToast
that behaves that way.  There are a couple other uses of SnapshotAny
that I didn't think I wanted to mess with.
        regards, tom lane


Re: tuptoaster.c must *not* use SnapshotAny

От
Tom Lane
Дата:
Hiroshi Inoue <Inoue@tpf.co.jp> writes:
> SnapshotToast doesn't seem a proper solution to me

Do you have a better idea in mind?
        regards, tom lane


Re: tuptoaster.c must *not* use SnapshotAny

От
Hiroshi Inoue
Дата:
Tom Lane wrote:
> 
> Jan Wieck <janwieck@yahoo.com> writes:
> >     Originally I added SnapshotAny only to solve the problem that
> >     the lookup of the toast table happens  independant  from  the
> >     access  to  the  main  tuple, and that in fact the main tuple
> >     could have been deleted by the own transaction in between.
> 
> >     Would changing SnapshotAny  to  honor  HEAP_MOVED_*  fix  the
> >     problem?  If nothing else by now uses this snapshot, it could
> >     be done in place.
> 
> What I've done is add an additional snapshot class SnapshotToast
> that behaves that way.  There are a couple other uses of SnapshotAny
> that I didn't think I wanted to mess with.

AFAIR SnapshotAny was introduced for Foreign Key stuff.
SnapshotToast doesn't seem a proper solution to me but
7.2 should be released ASAP anyway.

regards,
Hiroshi Inoue


Re: tuptoaster.c must *not* use SnapshotAny

От
Hiroshi Inoue
Дата:
Tom Lane wrote:
> 
> Hiroshi Inoue <Inoue@tpf.co.jp> writes:
> > SnapshotToast doesn't seem a proper solution to me
> 
> Do you have a better idea in mind?

The same snapshot for both the main and the toast table
seems better though I don't know how it's difficult to
change. For example is it possible to update a toast
chunk partially using SnapshotToast ?

regards,
Hiroshi Inoue


Re: tuptoaster.c must *not* use SnapshotAny

От
Tom Lane
Дата:
Hiroshi Inoue <Inoue@tpf.co.jp> writes:
> The same snapshot for both the main and the toast table
> seems better though I don't know how it's difficult to
> change.

That would be cleaner but I don't see any way to do it.  The toasted
value gets passed all over the system, potentially, before someone asks
to detoast it.  At that point you really don't have a good way to know
which snapshot was used to fetch the main-table row.

> For example is it possible to update a toast
> chunk partially using SnapshotToast ?

As things stand (with either SnapshotToast or the old SnapshotAny way)
it is never possible to update an individual toast value, either
partially or wholly.  All you can do is lay down a new toast value (with
a new identifying OID) and then delete the old one.

But I'm not sure that this is wrong, or fixable.  Trying to update part
of a toasted value is very much like wanting to update part of an
existing row in-place, which we cannot possibly do.  We have to lay down
a whole new row whenever any part of it is updated.
        regards, tom lane


Re: tuptoaster.c must *not* use SnapshotAny

От
Jan Wieck
Дата:
Tom Lane wrote:
> Hiroshi Inoue <Inoue@tpf.co.jp> writes:
> > The same snapshot for both the main and the toast table
> > seems better though I don't know how it's difficult to
> > change.
>
> That would be cleaner but I don't see any way to do it.  The toasted
> value gets passed all over the system, potentially, before someone asks
> to detoast it.  At that point you really don't have a good way to know
> which snapshot was used to fetch the main-table row.
   That's  IMHO  not the point. The design of toast is that if a   main heap tuple points to a toast id, and the heap
tuple at   scan  time  was  visible,  then  no matter what, the value is   readable.  Regardless  of  any  isolation
level or  command   counter  or  whatnot.  That's achieved by allways assigning a   new toast ID on change of a toasted
value.So if the  columns   value  changed, the old and new rows point to different toast   values not by visibility but
bytoast  id.   If  it  did  not   change, they point to the same.
 
   The  problem  arises  only because SnapshotAny is *too* blind   and ignores the special vacuum HEAP_MOVED_* states,
sothat a   scan after a crashed vacuum full, that left yet unborn moved-   in tuples in the toast table, can see them.
 
   Since the design of toast is NO NEED to care about  snapshots   during  toast  value  fetching,  I don't see why
doingso now   would be a cleaner solution.
 


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #




_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com



Re: tuptoaster.c must *not* use SnapshotAny

От
Hiroshi Inoue
Дата:
Tom Lane wrote:
> 
> Hiroshi Inoue <Inoue@tpf.co.jp> writes:
> 
> > For example is it possible to update a toast
> > chunk partially using SnapshotToast ?
> 
> As things stand (with either SnapshotToast or the old SnapshotAny way)
> it is never possible to update an individual toast value, either
> partially or wholly.  All you can do is lay down a new toast value (with
> a new identifying OID) and then delete the old one.
> 
> But I'm not sure that this is wrong, or fixable.  Trying to update part
> of a toasted value is very much like wanting to update part of an
> existing row in-place, which we cannot possibly do.

Bytea seems to be considered as a candidate for BLOB
though I think the entirely new type is preferable.
It seems impossible to implement a functionality like
inv_write using bytea which the current large object
stuff has.

regards,
Hiroshi Inoue


Re: tuptoaster.c must *not* use SnapshotAny

От
Bruce Momjian
Дата:
> Tom Lane wrote:
> > 
> > Hiroshi Inoue <Inoue@tpf.co.jp> writes:
> > 
> > > For example is it possible to update a toast
> > > chunk partially using SnapshotToast ?
> > 
> > As things stand (with either SnapshotToast or the old SnapshotAny way)
> > it is never possible to update an individual toast value, either
> > partially or wholly.  All you can do is lay down a new toast value (with
> > a new identifying OID) and then delete the old one.
> > 
> > But I'm not sure that this is wrong, or fixable.  Trying to update part
> > of a toasted value is very much like wanting to update part of an
> > existing row in-place, which we cannot possibly do.
> 
> Bytea seems to be considered as a candidate for BLOB
> though I think the entirely new type is preferable.
> It seems impossible to implement a functionality like
> inv_write using bytea which the current large object
> stuff has.

Agreed.  I think that was the reason we kept TOAST and large objects,
because large objects were designed for random read-write.  If we can
get large objects to auto-delete, probably with pg_depend, we can then
use them seamlessly with BLOB I/O routines.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: tuptoaster.c must *not* use SnapshotAny

От
Bruce Momjian
Дата:
> > Agreed.  I think that was the reason we kept TOAST and large objects,
> > because large objects were designed for random read-write.  If we can
> > get large objects to auto-delete, probably with pg_depend, we can then
> > use them seamlessly with BLOB I/O routines.
> 
> Oops I seem to have missed the discussion about excluding
> bytea from the candidate from BLOB. Yes now we seem to have
> a good reason to exclude existent type from the candidate
> of BLOB.

Well, we had the discussion when Jan was adding TOAST, and Jan was
saying we still need large objects for I/O purposes and for very large
items.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: tuptoaster.c must *not* use SnapshotAny

От
Hiroshi Inoue
Дата:
Bruce Momjian wrote:
> 
> > Tom Lane wrote:
> > >
> > > Hiroshi Inoue <Inoue@tpf.co.jp> writes:
> > >
> > > > For example is it possible to update a toast
> > > > chunk partially using SnapshotToast ?
> > >
> > > As things stand (with either SnapshotToast or the old SnapshotAny way)
> > > it is never possible to update an individual toast value, either
> > > partially or wholly.  All you can do is lay down a new toast value (with
> > > a new identifying OID) and then delete the old one.
> > >
> > > But I'm not sure that this is wrong, or fixable.  Trying to update part
> > > of a toasted value is very much like wanting to update part of an
> > > existing row in-place, which we cannot possibly do.
> >
> > Bytea seems to be considered as a candidate for BLOB
> > though I think the entirely new type is preferable.
> > It seems impossible to implement a functionality like
> > inv_write using bytea which the current large object
> > stuff has.
> 
> Agreed.  I think that was the reason we kept TOAST and large objects,
> because large objects were designed for random read-write.  If we can
> get large objects to auto-delete, probably with pg_depend, we can then
> use them seamlessly with BLOB I/O routines.

Oops I seem to have missed the discussion about excluding
bytea from the candidate from BLOB. Yes now we seem to have
a good reason to exclude existent type from the candidate
of BLOB.

regards,
Hiroshi Inoue


Re: tuptoaster.c must *not* use SnapshotAny

От
Tom Lane
Дата:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Well, we had the discussion when Jan was adding TOAST, and Jan was
> saying we still need large objects for I/O purposes and for very large
> items.

Yeah.  I hadn't realized until this conversation that TOAST couldn't
support partial updates of huge values.  That is a definite problem for
supporting lo_read/lo_write-type operations.
        regards, tom lane


Re: tuptoaster.c must *not* use SnapshotAny

От
Bruce Momjian
Дата:
Jan Wieck wrote:
> Bruce Momjian wrote:
> >
> > Agreed.  I think that was the reason we kept TOAST and large objects,
> > because large objects were designed for random read-write.  If we can
> > get large objects to auto-delete, probably with pg_depend, we can then
> > use them seamlessly with BLOB I/O routines.
> 
>     Not  entirely.  BLOB's  should  have  copy semantics, so that
>     doing
> 
>         INSERT INTO tab1 SELECT id, blob_column FROM tab2 ...
> 
>     and subsequently changing the blob in  either  tab1  or  tab2
>     does  NOT change the blob in the other table. Currently, even
>     if you make the pg_depend stuff smart enough to let the  blob
>     live  until  all  references  are  gone, the two tables would
>     share the same blob.  And that's not only  IMHO  wrong,  it's
>     also incompatible to Oracle :-)

Yes, we have to add duplication large object stuff hard-wired to that new data
type.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: tuptoaster.c must *not* use SnapshotAny

От
Hiroshi Inoue
Дата:
Bruce Momjian wrote:
> 
> > > Agreed.  I think that was the reason we kept TOAST and large objects,
> > > because large objects were designed for random read-write.  If we can
> > > get large objects to auto-delete, probably with pg_depend, we can then
> > > use them seamlessly with BLOB I/O routines.
> >
> > Oops I seem to have missed the discussion about excluding
> > bytea from the candidate from BLOB. Yes now we seem to have
> > a good reason to exclude existent type from the candidate
> > of BLOB.
> 
> Well, we had the discussion when Jan was adding TOAST, and Jan was
> saying we still need large objects for I/O purposes and for very large
> items.

Though I've often seen the reference to bytea BLOB
I remember no clear negation. Don't we have to negate
it clearly from the first ?

regards,
Hiroshi Inoue


Re: tuptoaster.c must *not* use SnapshotAny

От
Jan Wieck
Дата:
Bruce Momjian wrote:
>
> Agreed.  I think that was the reason we kept TOAST and large objects,
> because large objects were designed for random read-write.  If we can
> get large objects to auto-delete, probably with pg_depend, we can then
> use them seamlessly with BLOB I/O routines.
   Not  entirely.  BLOB's  should  have  copy semantics, so that   doing
       INSERT INTO tab1 SELECT id, blob_column FROM tab2 ...
   and subsequently changing the blob in  either  tab1  or  tab2   does  NOT change the blob in the other table.
Currently,even   if you make the pg_depend stuff smart enough to let the  blob   live  until  all  references  are
gone,the two tables would   share the same blob.  And that's not only  IMHO  wrong,  it's   also incompatible to Oracle
:-)


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #



_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com



Re: tuptoaster.c must *not* use SnapshotAny

От
Bruce Momjian
Дата:
> > Well, we had the discussion when Jan was adding TOAST, and Jan was
> > saying we still need large objects for I/O purposes and for very large
> > items.
> 
> Though I've often seen the reference to bytea BLOB
> I remember no clear negation. Don't we have to negate
> it clearly from the first ?

As I remember, we only mention BLOB when people ask.  We don't refer to
either as BLOB, only as TOAST/bytea and large objects.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: tuptoaster.c must *not* use SnapshotAny

От
Joe Conway
Дата:
Hiroshi Inoue wrote:

> Though I've often seen the reference to bytea BLOB
> I remember no clear negation. Don't we have to negate
> it clearly from the first ?
> 
> regards,
> Hiroshi Inoue


With the changes in 7.2, bytea is actually very close to matching the
SQL99 definition of binary strings, which it also refers to as BLOB.
BLOBS should include support for (section 4.3):

<comparison predicate> - yes
<blob concatenation> - yes
<blob substring function> - yes
<blob overlay function> - no
<trim function> - yes
<length expression> - yes
<position expression> - yes
<like predicate> - yes

Other than the overlay function, the primary thing missing is the use ofhex as the I/O representation.

See table 3-9 at
http://candle.pha.pa.us/main/writings/pgsql/sgml/datatype-binary.html
for more info.

Joe





Re: tuptoaster.c must *not* use SnapshotAny

От
Jan Wieck
Дата:
Hiroshi Inoue wrote:
[Charset iso-2022-jp unsupported, skipping...]
> Though I've often seen the reference to bytea BLOB
> I remember no clear negation. Don't we have to negate
> it clearly from the first ?
   Any  datatype that has the potential to be loaded into memory   entirely by the backend is dangerous. It  leads  to
problems  like  "out  of  swapspace",  which  aren't  really  funny  in   production.
 
   We need something that  is  restricted  to  streaming  access   only.  Yet, it needs to have copy semantics.
   That  means to me, that this will not only be a new datatype,   but  some  more  infrastructure  too.  We  need
INSERT  ...   RETURNING  ...   where  the  values  for  BLOBs  are given as   NEW_BLOB() and the returned identifier is
subsequently  used   to open the created BLOBs for writing and pump the data in.
 
   We  need  heap  access level tuple support to duplicate those   values on INSERT ... SELECT, to remove them on
DELETEand  to   manipulate them on UPDATE.
 
   We  need new functionality in procedural languages to support   streaming in and out of these guy's, where the
actualdata is   buffered  in  temp  files  and  only  blob identifiers passed   around.
 
   I see the final storage of blob data to happen in  the  toast   table,   sliced,   diced   and   handy   (most
probably not   compressed). But I don't see an easy way to abuse  the  bytea   datatype for real blob support.
 


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #



_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com



Re: tuptoaster.c must *not* use SnapshotAny

От
Joe Conway
Дата:
Jan Wieck wrote:

> 
>     Any  datatype that has the potential to be loaded into memory
>     entirely by the backend is dangerous. It  leads  to  problems
>     like  "out  of  swapspace",  which  aren't  really  funny  in
>     production.
> 
>     We need something that  is  restricted  to  streaming  access
>     only.  Yet, it needs to have copy semantics.
> 
>     That  means to me, that this will not only be a new datatype,
>     but  some  more  infrastructure  too.  We  need  INSERT   ...
>     RETURNING  ...   where  the  values  for  BLOBs  are given as
>     NEW_BLOB() and the returned identifier is  subsequently  used
>     to open the created BLOBs for writing and pump the data in.
> 
>     We  need  heap  access level tuple support to duplicate those
>     values on INSERT ... SELECT, to remove them on DELETE and  to
>     manipulate them on UPDATE.
> 
>     We  need new functionality in procedural languages to support
>     streaming in and out of these guy's, where the actual data is
>     buffered  in  temp  files  and  only  blob identifiers passed
>     around.
> 
>     I see the final storage of blob data to happen in  the  toast
>     table,   sliced,   diced   and   handy   (most  probably  not
>     compressed). But I don't see an easy way to abuse  the  bytea
>     datatype for real blob support.
> 

All this may be needed for BLOBs that are *huge*, but there are at least 
two classes of BLOB data that are not typically huge in practice. One is 
encrypted or hashed data, the other is image data for web pages. Either 
of these work very well with the current bytea support.

Joe




Re: tuptoaster.c must *not* use SnapshotAny

От
Hiroshi Inoue
Дата:
Joe Conway wrote:
> 
> Hiroshi Inoue wrote:
> 
> > Though I've often seen the reference to bytea BLOB
> > I remember no clear negation. Don't we have to negate
> > it clearly from the first ?
> >
> > regards,
> > Hiroshi Inoue
> 
> With the changes in 7.2, bytea is actually very close to matching the
> SQL99 definition of binary strings, which it also refers to as BLOB.
> BLOBS should include support for (section 4.3):
> 
> <comparison predicate> - yes
> <blob concatenation> - yes
> <blob substring function> - yes
> <blob overlay function> - no
> <trim function> - yes
> <length expression> - yes
> <position expression> - yes
> <like predicate> - yes

> Other than the overlay function, 

It seems good as long as they are not large(huge ?).
It seems a significant flaw for bytea to have no
possibility of partial update.

> the primary thing missing is the use of
>  hex as the I/O representation.

I don't love the current I/O representation of bytea.
It seems worth adding a new data type only to change the 
binary I/O representation.

regards,
Hiroshi Inoue