Обсуждение: Performance impact of record sizes

Поиск

Список

Период

Сортировка

Performance impact of record sizes

От

John Moore

Дата:

04 июля 2002 г., 18:17:51

We have a need to store text data which typically is just a hundred or so
bytes, but in some cases may extend to a few thousand. Our current field
has a varchar of 1024, which is not large enough. Key data is fixed sized
and much smaller in this same record.

Our application is primarily transaction oriented, which means that records
will normally be fetched via random access, not sequential scans.

The question  is: what size thresholds exist? I assume that there is a
"page" size over which the record will be split into more than one. What is
that size, and does the spill cost any more or less than I had split the
record into two or more individual records in order to handle the same data?

Obviously, the easiest thing for me to do is just set the varchar to
something big (say - 10K) but I don't want to do this without understanding
the OLTP performance impact.

Thanks in advance



John Moore

http://www.tinyvital.com/personal.html

UNITED WE STAND

Re: Performance impact of record sizes

От

Bruce Momjian

Дата:

04 июля 2002 г., 18:53:20

John Moore wrote:
> We have a need to store text data which typically is just a hundred or so
> bytes, but in some cases may extend to a few thousand. Our current field
> has a varchar of 1024, which is not large enough. Key data is fixed sized
> and much smaller in this same record.
>
> Our application is primarily transaction oriented, which means that records
> will normally be fetched via random access, not sequential scans.
>
> The question  is: what size thresholds exist? I assume that there is a
> "page" size over which the record will be split into more than one. What is
> that size, and does the spill cost any more or less than I had split the
> record into two or more individual records in order to handle the same data?
>
> Obviously, the easiest thing for me to do is just set the varchar to
> something big (say - 10K) but I don't want to do this without understanding
> the OLTP performance impact.
>

If you don't want a limit, use TEXT.  Long values are automatically
stored in TOAST tables to avoid performance problems with sequential
scans over long row values.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Re: Performance impact of record sizes

От

John Moore

Дата:

04 июля 2002 г., 19:23:28

>If you don't want a limit, use TEXT.  Long values are automatically
>stored in TOAST tables to avoid performance problems with sequential
>scans over long row values.


Thanks...

I wasn't quite clear enough in my question.... I am focused on OLTP
performance, and in my case the vast majority of the rows will have only a
few hundred bytes in that column, while a few (1%?) may be larger.

I assume that internally there is a fixed page size (by which I mean cache
buffer size  or disk read size)  for normal records. In my case, 99% of the
physical rows should be short, and would fit easily in whatever that size
is. So I *suspect* I want to keep the data in the physical row, rather than
using TEXT and having it stored separately from the record.. The question
is... are there any unexpected consequences. For example, if I have a whole
bunch of rows with, say, a 10K varchar field which is only populated with a
hundred or two bytes each, will it perform just as well as if that field
was defined as a 200 byte field?

A related question: is it more expensive to use varchar than fixed char
fields? I assume some additional work in physically unpacking the record.

My past experience is with Informix, and a lot with very old versions where
views were high cost, and so were varchars. Likewise, you didn't want your
physical row to exceed the size of a physical page if you could avoid it.

John

Re: Performance impact of record sizes

От

Bruce Momjian

Дата:

04 июля 2002 г., 19:39:25

John Moore wrote:
>
> >If you don't want a limit, use TEXT.  Long values are automatically
> >stored in TOAST tables to avoid performance problems with sequential
> >scans over long row values.
>
>
> Thanks...
>
> I wasn't quite clear enough in my question.... I am focused on OLTP
> performance, and in my case the vast majority of the rows will have only a
> few hundred bytes in that column, while a few (1%?) may be larger.
>
> I assume that internally there is a fixed page size (by which I mean cache
> buffer size  or disk read size)  for normal records. In my case, 99% of the
> physical rows should be short, and would fit easily in whatever that size
> is. So I *suspect* I want to keep the data in the physical row, rather than
> using TEXT and having it stored separately from the record.. The question
> is... are there any unexpected consequences. For example, if I have a whole
> bunch of rows with, say, a 10K varchar field which is only populated with a
> hundred or two bytes each, will it perform just as well as if that field
> was defined as a 200 byte field?
>
> A related question: is it more expensive to use varchar than fixed char
> fields? I assume some additional work in physically unpacking the record.
>
> My past experience is with Informix, and a lot with very old versions where
> views were high cost, and so were varchars. Likewise, you didn't want your
> physical row to exceed the size of a physical page if you could avoid it.

When the row length nears/exceeds the page size (8k) the longer column
values are automatically pushed into a TOAST backup table so it is all
done transparently, no matter what length you specify for the column,
and performance is always good, i.e., if you have a 100mb value in a
column, and do a sequential scan, that 100mb is only accessed if the
column containing the 100mb is accessed.

Specifically, it isn't the defined length of the column that counts, but
the total length of each individual row that determines if the TOAST
backup table is used for _some_ of the column values.

Does that answer your question?

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Re: Performance impact of record sizes

От

Tom Lane

Дата:

04 июля 2002 г., 22:50:15

John Moore <postgres@tinyvital.com> writes:
> So I *suspect* I want to keep the data in the physical row, rather than
> using TEXT and having it stored separately from the record.

You seem to be reading something into the TEXT type that's not there;
perhaps you are carrying over associations from some other DBMS?
Here's how it works in Postgres:

The *only* performance difference between TEXT and VARCHAR(n) is that
for a VARCHAR(n) column, on insert or update there's an extra function
call applied to the new value to verify that its length is within N.
TEXT makes no such check; so TEXT has a small performance advantage if
there are lots of updates.  On the SELECT side there's no difference.

Both TEXT and VARCHAR(n) data is stored using TOAST (automatic
compression or out-of-line storage) if the row gets too big.  There
is no difference between the two datatypes on this score.

For the particular application you describe I think TOAST storage will
be ideal, since the overhead of compression or out-of-line storage is
only paid on the few rows where it's needed.

What you really ought to be asking yourself is whether (for this
application) a hard limit on the field width makes sense at all.
IMHO you should only use VARCHAR(n) if you can make a defensible
argument for the particular value of N you are using.  If you can't
explain why you are setting the field width limit, then you probably
do not need a specific limit at all, and should be using TEXT.

> A related question: is it more expensive to use varchar than fixed char
> fields?

There is no savings from using CHAR(n) --- most of the time it's a
substantial loss, because of the extra I/O costs associated with the
space used by all those padding blanks.  I can think of very very few
applications where CHAR(n) is really a sensible choice over VARCHAR(n).
US state postal codes (CHAR(2)) are an example, but beyond that the
valid use cases are mighty thin on the ground.

Bruce, it occurs to me that the "Performance Tips" section of the user's
guide ought to have an explanation of the performance implications of
TOAST.  We have some material in the FAQ but there's precious little
mention in the manual.  Thoughts?

            regards, tom lane

Re: Performance impact of record sizes

От

Bruce Momjian

Дата:

04 июля 2002 г., 23:02:18

Tom Lane wrote:
> There is no savings from using CHAR(n) --- most of the time it's a
> substantial loss, because of the extra I/O costs associated with the
> space used by all those padding blanks.  I can think of very very few
> applications where CHAR(n) is really a sensible choice over VARCHAR(n).
> US state postal codes (CHAR(2)) are an example, but beyond that the
> valid use cases are mighty thin on the ground.
>
> Bruce, it occurs to me that the "Performance Tips" section of the user's
> guide ought to have an explanation of the performance implications of
> TOAST.  We have some material in the FAQ but there's precious little
> mention in the manual.  Thoughts?

Yes, I am working on my O'Reilly slides now and am dealing with how to
address TOAST for admins, specifically for disk space computations,
which I did cover in the new 'disk usage' section of the manual.

And in talking to the questioner, I was wondering if we have this
auto-out-of-line TOAST description anywhere in the docs.  If not, we
need to add that.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Re: Performance impact of record sizes

От

Tom Lane

Дата:

04 июля 2002 г., 23:12:43

Bruce Momjian <pgman@candle.pha.pa.us> writes:
> And in talking to the questioner, I was wondering if we have this
> auto-out-of-line TOAST description anywhere in the docs.  If not, we
> need to add that.

Yes; I am not sure that that ever got into the main docs :-(.
We were so tickled with the notion that you no longer needed to
think about oversized fields, that we forgot that it ought to be
documented anyhow.

Jan's old TOAST project page,
http://www.ca.postgresql.org/projects/devel-toast.html
has some stuff that could be recycled into the manual, and I think
we have done some good writeups in response to older questions in
the mailing lists.

            regards, tom lane

Re: Performance impact of record sizes

От

Bruce Momjian

Дата:

04 июля 2002 г., 23:12:52

Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > And in talking to the questioner, I was wondering if we have this
> > auto-out-of-line TOAST description anywhere in the docs.  If not, we
> > need to add that.
>
> Yes; I am not sure that that ever got into the main docs :-(.
> We were so tickled with the notion that you no longer needed to
> think about oversized fields, that we forgot that it ought to be
> documented anyhow.
>
> Jan's old TOAST project page,
> http://www.ca.postgresql.org/projects/devel-toast.html
> has some stuff that could be recycled into the manual, and I think
> we have done some good writeups in response to older questions in
> the mailing lists.

Yea, I will keep this email and merge something in.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Re: Performance impact of record sizes

От

John Moore

Дата:

05 июля 2002 г., 11:57:08

At 04:49 PM 7/4/2002, Tom Lane wrote:
>John Moore <postgres@tinyvital.com> writes:
> > So I *suspect* I want to keep the data in the physical row, rather than
> > using TEXT and having it stored separately from the record.
>
>You seem to be reading something into the TEXT type that's not there;
>perhaps you are carrying over associations from some other DBMS?
>Here's how it works in Postgres:

Indeed.


>The *only* performance difference between TEXT and VARCHAR(n) is that
>for a VARCHAR(n) column, on insert or update there's an extra function
>call applied to the new value to verify that its length is within N.
>TEXT makes no such check; so TEXT has a small performance advantage if
>there are lots of updates.  On the SELECT side there's no difference.
>
>Both TEXT and VARCHAR(n) data is stored using TOAST (automatic
>compression or out-of-line storage) if the row gets too big.  There
>is no difference between the two datatypes on this score.
>
>For the particular application you describe I think TOAST storage will
>be ideal, since the overhead of compression or out-of-line storage is
>only paid on the few rows where it's needed.
>
>What you really ought to be asking yourself is whether (for this
>application) a hard limit on the field width makes sense at all.
>IMHO you should only use VARCHAR(n) if you can make a defensible
>argument for the particular value of N you are using.  If you can't
>explain why you are setting the field width limit, then you probably
>do not need a specific limit at all, and should be using TEXT.

Yes... now that I understand the performance ramifications, this makes
perfect sense. Overall, the way you guys are doing this looks very elegant,
and just right for what I am doing.


> > A related question: is it more expensive to use varchar than fixed char
> > fields?
>
>There is no savings from using CHAR(n) --- most of the time it's a
>substantial loss, because of the extra I/O costs associated with the
>space used by all those padding blanks.  I can think of very very few
>applications where CHAR(n) is really a sensible choice over VARCHAR(n).
>US state postal codes (CHAR(2)) are an example, but beyond that the
>valid use cases are mighty thin on the ground.

My question was indeed a carry over from a previous database (Informix) and
from performance analyses done about 14 years ago!


>Bruce, it occurs to me that the "Performance Tips" section of the user's
>guide ought to have an explanation of the performance implications of
>TOAST.  We have some material in the FAQ but there's precious little
>mention in the manual.  Thoughts?

I would like that. I did look in performance tips before asking.




John Moore

http://www.tinyvital.com/personal.html

UNITED WE STAND

Re: Performance impact of record sizes

От

Shaun Thomas

Дата:

05 июля 2002 г., 15:36:07

On Thu, 4 Jul 2002, Tom Lane wrote:

> I can think of very very few applications where CHAR(n) is really a
> sensible choice over VARCHAR(n).

text hashes such as MD5 and crypt, stock or serial numbers, automotive
VIN codes, invoice sequences, emulated bitmasks, etc.  Lots of
industry-specific things are non-varying sequences of characters.

Besides, you don't know true horror until you've seen an amateur set up
an entire database using nothing but text columns in all the tables.
Sometimes these types convey meaning about the data they contain, too.
^_^

> US state postal codes (CHAR(2)) are an example, but beyond that the
> valid use cases are mighty thin on the ground.

Don't forget zipcodes.  You can get away with a char(9), or a char(5)
and an optional char(4) for the postal extension.  ^_^

--
+-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-+
| Shaun M. Thomas                INN Database Administrator           |
| Phone: (309) 743-0812          Fax  : (309) 743-0830                |
| Email: sthomas@townnews.com    AIM  : trifthen                      |
| Web  : www.townnews.com                                             |
|                                                                     |
|     "Most of our lives are about proving something, either to       |
|      ourselves or to someone else."                                 |
|                                           -- Anonymous              |
+-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-+

Re: Performance impact of record sizes

От

Bruce Momjian

Дата:

16 июля 2002 г., 01:33:44

I see in the data types section for character types this text:

     There are no performance differences between these three types,
     apart from the increased storage size when using the blank-padded
     type.

I can't improve on that.

---------------------------------------------------------------------------

Tom Lane wrote:
> John Moore <postgres@tinyvital.com> writes:
> > So I *suspect* I want to keep the data in the physical row, rather than
> > using TEXT and having it stored separately from the record.
>
> You seem to be reading something into the TEXT type that's not there;
> perhaps you are carrying over associations from some other DBMS?
> Here's how it works in Postgres:
>
> The *only* performance difference between TEXT and VARCHAR(n) is that
> for a VARCHAR(n) column, on insert or update there's an extra function
> call applied to the new value to verify that its length is within N.
> TEXT makes no such check; so TEXT has a small performance advantage if
> there are lots of updates.  On the SELECT side there's no difference.
>
> Both TEXT and VARCHAR(n) data is stored using TOAST (automatic
> compression or out-of-line storage) if the row gets too big.  There
> is no difference between the two datatypes on this score.
>
> For the particular application you describe I think TOAST storage will
> be ideal, since the overhead of compression or out-of-line storage is
> only paid on the few rows where it's needed.
>
> What you really ought to be asking yourself is whether (for this
> application) a hard limit on the field width makes sense at all.
> IMHO you should only use VARCHAR(n) if you can make a defensible
> argument for the particular value of N you are using.  If you can't
> explain why you are setting the field width limit, then you probably
> do not need a specific limit at all, and should be using TEXT.
>
> > A related question: is it more expensive to use varchar than fixed char
> > fields?
>
> There is no savings from using CHAR(n) --- most of the time it's a
> substantial loss, because of the extra I/O costs associated with the
> space used by all those padding blanks.  I can think of very very few
> applications where CHAR(n) is really a sensible choice over VARCHAR(n).
> US state postal codes (CHAR(2)) are an example, but beyond that the
> valid use cases are mighty thin on the ground.
>
> Bruce, it occurs to me that the "Performance Tips" section of the user's
> guide ought to have an explanation of the performance implications of
> TOAST.  We have some material in the FAQ but there's precious little
> mention in the manual.  Thoughts?
>
>             regards, tom lane
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
>
>
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Re: Performance impact of record sizes

От

Bruce Momjian

Дата:

16 июля 2002 г., 03:49:04

Bruce Momjian wrote:
>
> I see in the data types section for character types this text:
>
>      There are no performance differences between these three types,
>      apart from the increased storage size when using the blank-padded
>      type.
>
> I can't improve on that.

I added a mention of TOAST in the character data type docs:

    Long values are also stored in background tables so they don't
    interfere with rapid access to the shorter column values.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Performance impact of record sizes