Обсуждение: data model one large and many small columns

Поиск
Список
Период
Сортировка

data model one large and many small columns

От
"Campbell, Lance"
Дата:

PostgreSQL 9.5.3

 

Suppose you have a table that consists of a lot of small sized columns with one really large text column.  The text column represents an average sized web page.  When inserting or updating you will work with only one row at a time. 

 

When selecting information you will either:

A)     select all columns in one row for display or editing purposes. 

B)      Or you will select hundreds of rows but NOT the very large text column.

 

Question: Is there any performance to be gained from PostgreSQL by storing the data as two tables versus one table?  In the two table model you would put the large column in a secondary table with a foreign key back to the primary table.  So each table would have a one to one relationship.  In the two table model when you need a single record with everything you would select from both tables based on a key.  When selecting hundreds of rows at one time you would only select from the primary table.

 

I know there is this concept of TOAST.  I don’t know if it even matters to PostgreSQL if I had two separate tables or just one.

 

Thanks for your insight.

 

Lance

 

 

Re: data model one large and many small columns

От
Scott Whitney
Дата:
My immediate response is that if you know at the time you build the query and exclude the BLOB/text, it _should_ be immaterial, but inserts might suffer in and single table based on indexes.

Discussion?




-------- Original message --------
From: "Campbell, Lance" <lance@illinois.edu>
Date: 07/26/2016 3:24 PM (GMT-06:00)
To: pgsql-admin@postgresql.org
Subject: [ADMIN] data model one large and many small columns

PostgreSQL 9.5.3

 

Suppose you have a table that consists of a lot of small sized columns with one really large text column.  The text column represents an average sized web page.  When inserting or updating you will work with only one row at a time. 

 

When selecting information you will either:

A)     select all columns in one row for display or editing purposes. 

B)      Or you will select hundreds of rows but NOT the very large text column.

 

Question: Is there any performance to be gained from PostgreSQL by storing the data as two tables versus one table?  In the two table model you would put the large column in a secondary table with a foreign key back to the primary table.  So each table would have a one to one relationship.  In the two table model when you need a single record with everything you would select from both tables based on a key.  When selecting hundreds of rows at one time you would only select from the primary table.

 

I know there is this concept of TOAST.  I don’t know if it even matters to PostgreSQL if I had two separate tables or just one.

 

Thanks for your insight.

 

Lance

 

 



Journyx, Inc.
7600 Burnet Road #300
Austin, TX 78757
www.journyx.com

p 512.834.8888 
f 512-834-8858 

Do you receive our promotional emails? You can subscribe or unsubscribe to those emails at http://go.journyx.com/emailPreference/e/4932/714/ 

Re: data model one large and many small columns

От
"David G. Johnston"
Дата:
On Tue, Jul 26, 2016 at 4:22 PM, Campbell, Lance <lance@illinois.edu> wrote:

Is there any performance to be gained from PostgreSQL by storing the data as two tables versus one table?


​PostgreSQL already does this for you via the TOAST mechanism you mentioned.


David J.

Re: data model one large and many small columns

От
Tom Lane
Дата:
"Campbell, Lance" <lance@illinois.edu> writes:
> Suppose you have a table that consists of a lot of small sized columns with one really large text column.  The text
columnrepresents an average sized web page.  When inserting or updating you will work with only one row at a time. 

> When selecting information you will either:
> A)     select all columns in one row for display or editing purposes.
> B)      Or you will select hundreds of rows but NOT the very large text column.

> Question: Is there any performance to be gained from PostgreSQL by
> storing the data as two tables versus one table?

No, probably not; TOAST will effectively do that for you by off-loading
the large text values into the side table.

If you were to do an explicit join, there would be some use-cases ---
mainly where you were selecting a LOT of rows --- where in theory you
could get a smarter plan from the explicit join.  TOAST will effectively
always fetch the text values via a nestloop-with-inner-indexscan plan,
but maybe hashing or merging would be smarter.  Unfortunately, with two
explicit tables, TOAST would still apply to the second table, which means
that what you've really got under the hood is a three-way join, with the
intermediate table contributing nothing except overhead.

So don't bother ...

            regards, tom lane


Re: data model one large and many small columns

От
"Campbell, Lance"
Дата:
I had a hunch this was the case.  I just wanted to double check.

Lance

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Tuesday, July 26, 2016 4:18 PM
To: Campbell, Lance <lance@illinois.edu>
Cc: pgsql-admin@postgresql.org
Subject: Re: [ADMIN] data model one large and many small columns

"Campbell, Lance" <lance@illinois.edu> writes:
> Suppose you have a table that consists of a lot of small sized columns with one really large text column.  The text
columnrepresents an average sized web page.  When inserting or updating you will work with only one row at a time. 

> When selecting information you will either:
> A)     select all columns in one row for display or editing purposes.
> B)      Or you will select hundreds of rows but NOT the very large text column.

> Question: Is there any performance to be gained from PostgreSQL by
> storing the data as two tables versus one table?

No, probably not; TOAST will effectively do that for you by off-loading the large text values into the side table.

If you were to do an explicit join, there would be some use-cases --- mainly where you were selecting a LOT of rows ---
wherein theory you could get a smarter plan from the explicit join.  TOAST will effectively always fetch the text
valuesvia a nestloop-with-inner-indexscan plan, but maybe hashing or merging would be smarter.  Unfortunately, with two
explicittables, TOAST would still apply to the second table, which means that what you've really got under the hood is
athree-way join, with the intermediate table contributing nothing except overhead. 

So don't bother ...

            regards, tom lane