Обсуждение: How to estimate size of a row and therefore how much progress this query has made

Поиск
Список
Период
Сортировка

How to estimate size of a row and therefore how much progress this query has made

От
Greg Stark
Дата:
I'm running a long running query; it's been running since this morning. It's
inserting data into a table. I'm trying to estimate how many rows it has
inserted so far and how much longer it'll run. For one thing I'm running out
of space on this disk...

How do I measure the amount of space a record takes in a table?
The table structure it's inserting into looks like this:

      Column       |          Type          | Modifiers
-------------------+------------------------+-----------
 str               | character varying(100) |
 foo_id            | integer                |
 bar_id            | integer                |
 dist              | double precision       |
 geom2             | box                    |

The last two columns are actually being populated with nulls. The first three
columns are being populated with data. The string in the first column is
actually always 12 characters.

Incidentally, if I set the field to a maximum of 12 will it save space? (It
doesn't in Oracle for example.) And are there any table parameters that allow
postgres to pack data more densely or avoid problems if I fill in the null
fields later with updates?

So far it's inserted quite a bit of data:

-rw-------    1 postgres postgres     1.0G Feb  1 13:35 /var/lib/postgres/data/base/992336/7992140
-rw-------    1 postgres postgres     1.0G Feb  1 14:51 /var/lib/postgres/data/base/992336/7992140.1
-rw-------    1 postgres postgres     1.0G Feb  1 16:14 /var/lib/postgres/data/base/992336/7992140.2
-rw-------    1 postgres postgres     1.0G Feb  1 17:27 /var/lib/postgres/data/base/992336/7992140.3
-rw-------    1 postgres postgres     1.0G Feb  1 18:45 /var/lib/postgres/data/base/992336/7992140.4
-rw-------    1 postgres postgres     1.0G Feb  1 20:23 /var/lib/postgres/data/base/992336/7992140.5
-rw-------    1 postgres postgres     1.0G Feb  1 21:34 /var/lib/postgres/data/base/992336/7992140.6
-rw-------    1 postgres postgres     1.0G Feb  1 22:47 /var/lib/postgres/data/base/992336/7992140.7
-rw-------    1 postgres postgres     577M Feb  1 23:30 /var/lib/postgres/data/base/992336/7992140.8

I think it'll insert a total of 107,535,604 records.

What I'm trying to figure out is, a) How much space will I need to make sure I
have for this table and b) will it finish within the next 24 hours or should I
just give up and rethink my plan.

I guess I have other questions like are there any tips on making inserting
lots of records fast. (Does turning off fsync help on big inserts or only when
you have lots of small queries?)

I'm also going to have to make at least one index on this table. I'm dreading
that. How do I estimate how much space the index will take up?


--
greg

Re: How to estimate size of a row and therefore how much progress this query has made

От
Tom Lane
Дата:
Greg Stark <gsstark@mit.edu> writes:
> How do I measure the amount of space a record takes in a table?
> The table structure it's inserting into looks like this:

>       Column       |          Type          | Modifiers
> -------------------+------------------------+-----------
>  str               | character varying(100) |
>  foo_id            | integer                |
>  bar_id            | integer                |
>  dist              | double precision       |
>  geom2             | box                    |

> The last two columns are actually being populated with nulls. The first three
> columns are being populated with data. The string in the first column is
> actually always 12 characters.

Okay, so 4+12+4+4 data bytes, plus header overhead --- is this 7.3 or
something older?  Let's assume older --- 24+32+4 bytes per row, plus
the null bitmap and alignment padding; say 64 bytes per row.

> So far it's inserted quite a bit of data: [8.5 Gig]

> I think it'll insert a total of 107,535,604 records.

Should be finishing pretty darn soon; in fact, I'd venture you already
had a good bit of data in the table.   You don't have much over 6 gig
accounted for here.

> I'm also going to have to make at least one index on this table. I'm dreading
> that. How do I estimate how much space the index will take up?

As above, except the header overhead is much less (I think 8+4 bytes per
row).

            regards, tom lane

Re: How to estimate size of a row and therefore how much progress this query has made

От
Greg Stark
Дата:
Thanks a lot for your help.

Tom Lane <tgl@sss.pgh.pa.us> writes:

> > The last two columns are actually being populated with nulls. The first three
> > columns are being populated with data. The string in the first column is
> > actually always 12 characters.
>
> Okay, so 4+12+4+4 data bytes, plus header overhead --- is this 7.3 or
> something older?  Let's assume older --- 24+32+4 bytes per row, plus
> the null bitmap and alignment padding; say 64 bytes per row.

It's 7.3

> > So far it's inserted quite a bit of data: [8.5 Gig]
>
> > I think it'll insert a total of 107,535,604 records.
>
> Should be finishing pretty darn soon; in fact, I'd venture you already
> had a good bit of data in the table.   You don't have much over 6 gig
> accounted for here.

Nope, the table was empty previously. I created it just before the insert.

Hm, I guess I have a bit of a puzzle.

Do OIDs consume space too? This is a INSERT INTO foo () (select ..) style
query so the records get OIDs no? I probably should get in the habit of
creating all my tables without OIDs since I don't use them. I haven't thought
about them previously.


--
greg

Re: How to estimate size of a row and therefore how much progress this query has made

От
Tom Lane
Дата:
Greg Stark <gsstark@mit.edu> writes:
>> Okay, so 4+12+4+4 data bytes, plus header overhead --- is this 7.3 or
>> something older?  Let's assume older --- 24+32+4 bytes per row, plus
>> the null bitmap and alignment padding; say 64 bytes per row.

> It's 7.3

In that case it should be 4 or so bytes less per row.  How sure are you
about your estimated row count?  You are way over what I figure the
table size ought to be.

> Do OIDs consume space too?

In 7.3, yes.  (Before that, you paid for the space whether you wanted
'em or not.)  You could shave another 4 bytes off if you created the
table without OIDs.

            regards, tom lane

Re: How to estimate size of a row and therefore how much progress this query has made

От
Greg Stark
Дата:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> > So far it's inserted quite a bit of data: [8.5 Gig]
>
> > I think it'll insert a total of 107,535,604 records.

Sorry, I found the problem. I was overflowing Emacs's integer datatype doing
my calculations for how many rows to expect. Postgres's analyze had the
correct answer. It's actually going to insert more than that, 375,971,060
records.

So that makes it 24G of data which won't fit on the partition.

I'll have to look for another way to do this.

Thanks again for your help.


--
greg