Обсуждение: Re: [HACKERS] generic LONG VARLENA

Поиск
Список
Период
Сортировка

Re: [HACKERS] generic LONG VARLENA

От
Zeugswetter Andreas SB
Дата:
> I am excited about the long data type.  This is _the_ way to do long
> data types.  Have any of the commercial databases figured out this way
> to do it.  I can't imagine a better system.

The commercial db's usually make the dba decide on a per column basis 
whether the value is stored inside the table or in an extra space 
(blobspace,lobspace ...). They all have propietary syntax for this.
(I would probably like it configurabe, whith some reasonable default) 
It is usually available for the text/byte and user defined datatypes. 
In PostgreSQL the array types come to mind.

What I think would be good is, if you could avoid the need for an index on 
the _LARGE_.. table.
My Idea would be to store an xtid of the first lob page slot in the user
table,
and have an xtid pointer to the next lob page slot in it, and so on.
That way you could avoid indices on the LARGE table.
SnapshotAny() would also see the correct long, since an updated value would 
get a new xtid anyway. No need to use up an extra oid.

Since lob's are typically large, the large overhead would be especially 
painful, so a different relkind with another pagelayout seems adequate.  

The pointer would imho be:

longbit|length|largetableoid|xtid_of_first_lobpage|loblength

Just some ideas
Andreas


Re: [HACKERS] generic LONG VARLENA

От
wieck@debis.com (Jan Wieck)
Дата:
Andreas Zeugswetter wrote:

> > I am excited about the long data type.  This is _the_ way to do long
> > data types.  Have any of the commercial databases figured out this way
> > to do it.  I can't imagine a better system.
>
> The commercial db's usually make the dba decide on a per column basis
> whether the value is stored inside the table or in an extra space
> (blobspace,lobspace ...). They all have propietary syntax for this.
> (I would probably like it configurabe, whith some reasonable default)
> It is usually available for the text/byte and user defined datatypes.

    Must  have  been  proprietary syntax. There are may places in
    SQL92 and SQL3 specs, where the words IMPLEMENTATION  DEFINED
    appear.   With   so   many   possible   differences   between
    implementations within standard  compliance,  there  must  be
    differences in the language too.

    For  the  database schema, we cannot avoid proprietary syntax
    to use  implementation  specific  features.   We  don't  have
    tablespaces, extents etc., but if we ever implement something
    like that, should we be unable to customize it because  there
    is no syntax defined in the standard?

> In PostgreSQL the array types come to mind.

    There  was  a  user request about "tuple too big" right today
    when storing a polygon.

> What I think would be good is, if you could avoid the need for an index on
> the _LARGE_.. table.
> My Idea would be to store an xtid of the first lob page slot in the user
> table,
> and have an xtid pointer to the next lob page slot in it, and so on.
> That way you could avoid indices on the LARGE table.
> SnapshotAny() would also see the correct long, since an updated value would
> get a new xtid anyway. No need to use up an extra oid.

    While I would like such an approach too, I don't want  to  do
    it  really.   It  would  require  to  treat  the  lob  tuples
    different from regular ones in vacuum. It is one of the  most
    important  tools  for  a  productional DB.  One single broken
    xtid chain  due  to  an  aborted  vacuum  will  corrupt  your
    database.  Better  keep  it working the same as for a regular
    table.

> Since lob's are typically large, the large overhead would be especially
> painful, so a different relkind with another pagelayout seems adequate.

    No, I think a single Oid index  on  a  relation,  where  only
    usually large tuples are stored is a very small overhead.


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#========================================= wieck@debis.com (Jan Wieck) #

Re: [HACKERS] generic LONG VARLENA

От
Bruce Momjian
Дата:
[Charset iso-8859-1 unsupported, filtering to ASCII...]
> 
> > I am excited about the long data type.  This is _the_ way to do long
> > data types.  Have any of the commercial databases figured out this way
> > to do it.  I can't imagine a better system.
> 
> The commercial db's usually make the dba decide on a per column basis 
> whether the value is stored inside the table or in an extra space 
> (blobspace,lobspace ...). They all have propietary syntax for this.
> (I would probably like it configurabe, whith some reasonable default) 
> It is usually available for the text/byte and user defined datatypes. 
> In PostgreSQL the array types come to mind.
> 
> What I think would be good is, if you could avoid the need for an index on 
> the _LARGE_.. table.
> My Idea would be to store an xtid of the first lob page slot in the user
> table,
> and have an xtid pointer to the next lob page slot in it, and so on.
> That way you could avoid indices on the LARGE table.
> SnapshotAny() would also see the correct long, since an updated value would 
> get a new xtid anyway. No need to use up an extra oid.

You are getting the data in 8k chunks, so it shouldn't be bad.  I think
using ctid is overly complex and makes vacuum fragile on that table. 
Better to use the tools we have like indexing and the standard tuple
layout code.  Custom solutions like ctid are better off only when we see
seriouis performance problems and can't resolve them any other way.

For example, having an expanded tuple cache will give us great speed
improvements.


--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026