Обсуждение: [GENERAL] huge table occupation after updates

Поиск

Список

Период

Сортировка

[GENERAL] huge table occupation after updates

От

Tom DalPozzo

Дата:

10 декабря 2016 г., 18:15:23

Hi,

I've a table ('stato') with an indexed bigint ('Id') and 5 bytea fields ('d0','d1',...,'d4').

I populated the table with 10000 rows; each d.. field inizialized with 20 bytes.

Reported table size is 1.5MB. OK.

Now, for 1000 times, I update 2000 different rows each time, changing d0 filed keeping the same length, and at the end of all, I issued VACUUM.

Now table size is 29MB.

Why so big? What is an upper bound to estimate a table occupation on disk?

The same test, redone with dX length=200 bytes instead of 20 reports:

Size before UPDATES = 11MB. OK.

Size after UPDATES = 1.7GB . Why?

Attached a txt file with details of statistical command I issued (max of row size, rows count etc....)

Regards

Pupillo

Вложения

report huge table.txt

Re: [GENERAL] huge table occupation after updates

От

Andreas Kretschmer

Дата:

10 декабря 2016 г., 18:36:36

Tom DalPozzo <t.dalpozzo@gmail.com> wrote:

> Hi,
> I've a table ('stato') with an indexed bigint ('Id') and 5 bytea fields
> ('d0','d1',...,'d4').
> I populated the table with 10000 rows; each d.. field inizialized with 20
> bytes.
> Reported table size is 1.5MB. OK.
> Now, for 1000 times,  I update  2000 different rows each time, changing d0
> filed keeping the same length, and at the end of all,  I issued VACUUM.
> Now table size is 29MB. 
>
> Why so big? What is an upper bound to estimate a table occupation on disk?

every (!) update creates a new row-version and marks the old row as
'old', but don't delete the old row.

A Vacuum marks old rows as reuseable - if there is no runnung
transaction that can see the old row-version. That's how MVCC works in
PostgreSQL.

Regards, Andreas Kretschmer
--
Andreas Kretschmer
http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [GENERAL] huge table occupation after updates

От

Francisco Olarte

Дата:

10 декабря 2016 г., 18:38:26

Hi Tom

On Sat, Dec 10, 2016 at 1:15 PM, Tom DalPozzo <t.dalpozzo@gmail.com> wrote:
...
> Reported table size is 1.5MB. OK.
That's 150 bytes per row, prety normal.
> Now, for 1000 times,  I update  2000 different rows each time, changing d0
> filed keeping the same length, and at the end of all,  I issued VACUUM.

And probably autovacuum or something similar kicked in meanwhile. 2M
updates is 200 updates per row, that's pretty heavy traffic, many
tables do not get that in their whole lifetime.

> Now table size is 29MB.
> Why so big? What is an upper bound to estimate a table occupation on disk?

Strictly, you could probably calculate an upper bound as row
size*number or insertions, given an update aproximates an insertion
plus deletion. Given the original insertion used 1.5Mb and you
repeated thar about 200 times, I would stimate 201*1.5=301.5Mb as an
upper bound, but I doubt that's of any use.

Those many updates probably left your table badly fragmented, with few
rows per page.  On a normal usage you do not need to worry, as
periodic vacuum would mark the space for reuse and the table will not
grow that big. But issuing an 1k updates on 20% of the table is hardly
normal usage, if you need this kind of usage maybe you should rethink
your strategies.

Vacuum full will probably pack the table and ''recover'' the space, it
should be fast with just 29Mb on disk. Not knowing your intended usage
nothing can be recommended, but I've had some usage patterns where a
heavy update plus vacuuum full was successfully used.

Francisco Olarte.

Re: [GENERAL] huge table occupation after updates

От

Tom DalPozzo

Дата:

10 декабря 2016 г., 20:25:54

Hi,

you're right, VACUUM FULL recovered the space, completely.

So, at this point I'm worried about my needs.

I cannot issue vacuum full as I read it locks the table.

In my DB, I (would) need to have a table with one bigint id field+ 10 bytea fields, 100 bytes long each (more or less, not fixed).

5/10000 rows maximum, but let's say 5000.

As traffic I can suppose 10000 updates per row per day (spread over groups of hours; each update involving two of those fields, randomly.

Also rows are chosen randomly (in my test I used a block of 2000 just to try one possibility).

So, it's a total of 50 millions updates per day, hence (50millions * 100 bytes *2 fields updated) 10Gbytes net per day.

I'm afraid it's not possible, according to my results.

Reagrds
Pupillo

2016-12-10 13:38 GMT+01:00 Francisco Olarte <folarte@peoplecall.com>:

Hi Tom

On Sat, Dec 10, 2016 at 1:15 PM, Tom DalPozzo <t.dalpozzo@gmail.com> wrote:
...
> Reported table size is 1.5MB. OK.
That's 150 bytes per row, prety normal.
> Now, for 1000 times, I update 2000 different rows each time, changing d0
> filed keeping the same length, and at the end of all, I issued VACUUM.

And probably autovacuum or something similar kicked in meanwhile. 2M
updates is 200 updates per row, that's pretty heavy traffic, many
tables do not get that in their whole lifetime.

> Now table size is 29MB.
> Why so big? What is an upper bound to estimate a table occupation on disk?

Strictly, you could probably calculate an upper bound as row
size*number or insertions, given an update aproximates an insertion
plus deletion. Given the original insertion used 1.5Mb and you
repeated thar about 200 times, I would stimate 201*1.5=301.5Mb as an
upper bound, but I doubt that's of any use.

Those many updates probably left your table badly fragmented, with few
rows per page. On a normal usage you do not need to worry, as
periodic vacuum would mark the space for reuse and the table will not
grow that big. But issuing an 1k updates on 20% of the table is hardly
normal usage, if you need this kind of usage maybe you should rethink
your strategies.

Vacuum full will probably pack the table and ''recover'' the space, it
should be fast with just 29Mb on disk. Not knowing your intended usage
nothing can be recommended, but I've had some usage patterns where a
heavy update plus vacuuum full was successfully used.

Francisco Olarte.

Re: [GENERAL] huge table occupation after updates

От

Rob Sargent

Дата:

10 декабря 2016 г., 21:04:42

> On Dec 10, 2016, at 6:25 AM, Tom DalPozzo <t.dalpozzo@gmail.com> wrote:
>
> Hi,
> you're right, VACUUM FULL  recovered the space, completely.
> So, at this point I'm worried about my needs.
> I cannot issue vacuum full as I read it locks the table.
> In my DB, I (would) need to have a table with one bigint id field+ 10 bytea fields, 100 bytes long each (more or
less,not fixed).  
> 5/10000 rows maximum, but let's say 5000.
> As traffic I can suppose 10000 updates per row per day (spread over groups of hours; each update involving two of
thosefields, randomly.  
> Also rows are chosen randomly (in my test I used a block of 2000 just to try one possibility).
> So, it's a total of 50 millions updates per day, hence (50millions * 100 bytes *2 fields updated) 10Gbytes net per
day.
> I'm afraid it's not possible, according to my results.
> Reagrds
> Pupillo
>

Are each of the updates visible to a user or read/analyzed by another activity?  If not you can do most of the update
inmemory and flush a snapshot periodically to the database.

Re: [GENERAL] huge table occupation after updates

От

Tom DalPozzo

Дата:

10 декабря 2016 г., 21:27:36

Hi,

I'd like to do that! But my DB must be crash proof! Very high reliability is a must.

I also use sycn replication.

Regards

Pupillo

2016-12-10 16:04 GMT+01:00 Rob Sargent <robjsargent@gmail.com>:

> On Dec 10, 2016, at 6:25 AM, Tom DalPozzo <t.dalpozzo@gmail.com> wrote:
>
> Hi,
> you're right, VACUUM FULL recovered the space, completely.
> So, at this point I'm worried about my needs.
> I cannot issue vacuum full as I read it locks the table.
> In my DB, I (would) need to have a table with one bigint id field+ 10 bytea fields, 100 bytes long each (more or less, not fixed).
> 5/10000 rows maximum, but let's say 5000.
> As traffic I can suppose 10000 updates per row per day (spread over groups of hours; each update involving two of those fields, randomly.
> Also rows are chosen randomly (in my test I used a block of 2000 just to try one possibility).
> So, it's a total of 50 millions updates per day, hence (50millions * 100 bytes *2 fields updated) 10Gbytes net per day.
> I'm afraid it's not possible, according to my results.
> Reagrds
> Pupillo
>

Are each of the updates visible to a user or read/analyzed by another activity? If not you can do most of the update in memory and flush a snapshot periodically to the database.

Re: [GENERAL] huge table occupation after updates

От

Rob Sargent

Дата:

10 декабря 2016 г., 21:36:30

> On Dec 10, 2016, at 7:27 AM, Tom DalPozzo <t.dalpozzo@gmail.com> wrote:
>
> Hi,
> I'd like to do that! But my DB must be crash proof! Very high reliability is a must.
> I also use sycn replication.
> Regards
> Pupillo
>
>
>
>
> Are each of the updates visible to a user or read/analyzed by another activity?  If not you can do most of the update
inmemory and flush a snapshot periodically to the database. 
>
>

This list discourages top posting. You’re asked to place your reply at the bottom

You haven’t laid out you’re application architecture (how many clients, who is reading who is writing, etc). Caching
doesn’tmean your database is any less crash proof.  At that rate of activity, depending on architecture, you could lose
updatesin all sorts of crash scenarios.

Re: [GENERAL] huge table occupation after updates

От

Tom DalPozzo

Дата:

10 декабря 2016 г., 23:01:34

2016-12-10 16:36 GMT+01:00 Rob Sargent <robjsargent@gmail.com>:

> On Dec 10, 2016, at 7:27 AM, Tom DalPozzo <t.dalpozzo@gmail.com> wrote:
>
> Hi,
> I'd like to do that! But my DB must be crash proof! Very high reliability is a must.
> I also use sycn replication.
> Regards
> Pupillo
>
>
>
>
> Are each of the updates visible to a user or read/analyzed by another activity? If not you can do most of the update in memory and flush a snapshot periodically to the database.
>
>

This list discourages top posting. You’re asked to place your reply at the bottom

You haven’t laid out you’re application architecture (how many clients, who is reading who is writing, etc). Caching doesn’t mean your database is any less crash proof. At that rate of activity, depending on architecture, you could lose updates in all sorts of crash scenarios.

As for crash proof, I meant that once my client app is told that her update request was committed, it mustn't get lost (hdd failure apart of course). And I can't wait to flush the cache before telling to the app :"committed".

I can replicate also the cache on the standby PC of course.

Regards

Pupillo

Re: [GENERAL] huge table occupation after updates

От

Rob Sargent

Дата:

10 декабря 2016 г., 23:10:15

> On Dec 10, 2016, at 10:01 AM, Tom DalPozzo <t.dalpozzo@gmail.com> wrote:
>
> 2016-12-10 16:36 GMT+01:00 Rob Sargent <robjsargent@gmail.com>:
>
> > On Dec 10, 2016, at 7:27 AM, Tom DalPozzo <t.dalpozzo@gmail.com> wrote:
> >
> > Hi,
> > I'd like to do that! But my DB must be crash proof! Very high reliability is a must.
> > I also use sycn replication.
> > Regards
> > Pupillo
> >
> >
> >
> >
> > Are each of the updates visible to a user or read/analyzed by another activity?  If not you can do most of the
updatein memory and flush a snapshot periodically to the database. 
> >
> >
>
> This list discourages top posting. You’re asked to place your reply at the bottom
>
> You haven’t laid out you’re application architecture (how many clients, who is reading who is writing, etc). Caching
doesn’tmean your database is any less crash proof.  At that rate of activity, depending on architecture, you could lose
updatesin all sorts of crash scenarios. 
>
> As for crash proof, I meant that once my client app is told that her update request was committed, it mustn't get
lost(hdd failure apart of course). And I can't wait to flush the cache before telling to the app :"committed". 
> I can replicate also the cache on the standby PC of course. 
> Regards
> Pupillo
>
>
>
> 
>

OK clientA sends an update; you commit and tell clientA committed. clientB updates same record; Do you tell clientA of
clientB’supdate? 
Are the two updates cumulative or destructive.
Can you report all updates done by clientA?

Re: [GENERAL] huge table occupation after updates

От

Tom DalPozzo

Дата:

10 декабря 2016 г., 23:27:50

2016-12-10 18:10 GMT+01:00 Rob Sargent <robjsargent@gmail.com>:

> On Dec 10, 2016, at 10:01 AM, Tom DalPozzo <t.dalpozzo@gmail.com> wrote:
>
> 2016-12-10 16:36 GMT+01:00 Rob Sargent <robjsargent@gmail.com>:
>
> > On Dec 10, 2016, at 7:27 AM, Tom DalPozzo <t.dalpozzo@gmail.com> wrote:
> >
> > Hi,
> > I'd like to do that! But my DB must be crash proof! Very high reliability is a must.
> > I also use sycn replication.
> > Regards
> > Pupillo
> >
> >
> >
> >
> > Are each of the updates visible to a user or read/analyzed by another activity? If not you can do most of the update in memory and flush a snapshot periodically to the database.
> >
> >
>
> This list discourages top posting. You’re asked to place your reply at the bottom
>
> You haven’t laid out you’re application architecture (how many clients, who is reading who is writing, etc). Caching doesn’t mean your database is any less crash proof. At that rate of activity, depending on architecture, you could lose updates in all sorts of crash scenarios.
>
> As for crash proof, I meant that once my client app is told that her update request was committed, it mustn't get lost (hdd failure apart of course). And I can't wait to flush the cache before telling to the app :"committed".
> I can replicate also the cache on the standby PC of course.
> Regards
> Pupillo
>
>
>
>
>

OK clientA sends an update; you commit and tell clientA committed. clientB updates same record; Do you tell clientA of clientB’s update?
Are the two updates cumulative or destructive.
Can you report all updates done by clientA?

I have one direct DB client (let's name it MIDAPP) only. This client of the DB is a server for up to 10000 final clients.

Any time MIDAPP is going to reply to a client, it must save a "status record with some data" related to that client and only after that, answering /committing the final client.

The next time the same final client will ask something, the same status record will be updated again (with a different content).

Each client can send up to 10000 reqs per day, up to 1 per second.

So, if I lose the cache, I'm done. I don't want to send the status to the final clients (in order to get it back in case).

I can evaluate to use a periodic_snapshot+ a sequential log for tracking the newer updates, but still to evaluate.

Regards

Pupillo

Re: [GENERAL] huge table occupation after updates

От

Francisco Olarte

Дата:

10 декабря 2016 г., 23:30:52

A couple of things first.

1.- This list encourages inline replying, editing the text, and frowns
upon top posting.

2.- Your HTML formatting with so a small size makes it harder for me (
and I can assume some others ) to properly read your messages.

If you want to discourage people replying to you, keep doing the two above.

On Sat, Dec 10, 2016 at 3:25 PM, Tom DalPozzo <t.dalpozzo@gmail.com> wrote:
> you're right, VACUUM FULL  recovered the space, completely.

Well, it always does. ;-)

> So, at this point I'm worried about my needs.
> I cannot issue vacuum full as I read it locks the table.

Well, first hint of your needs. Bear in mind vacuum fulls can be very
fast on small tables ( i.e, if you have the 1.5Mb table, and do 2000
updates and then a vacuum full that will be very fast, time it ).

> In my DB, I (would) need to have a table with one bigint id field+ 10 bytea
> fields, 100 bytes long each (more or less, not fixed).
> 5/10000 rows maximum, but let's say 5000.
> As traffic I can suppose 10000 updates per row per day (spread over groups
> of hours; each update involving two of those fields, randomly.
> Also rows are chosen randomly (in my test I used a block of 2000 just to try
> one possibility).
> So, it's a total of 50 millions updates per day, hence (50millions * 100
> bytes *2 fields updated) 10Gbytes net per day.

Not at all. That's the volume of updated data, you must multiply by
the ROW size, not just the changed size, in your case 50M * 1100 ( to
have some wiggle room ), 55Gbytes.

But this is the UPPER BOUND you asked for. Not the real one.

> I'm afraid it's not possible, according to my results.

It certaninly is. You can set a very aggresive autovacuum schedule for
the table, or even better, you may vacuum AFTER each hourly update.
This will mark dead tuples for reuse. It will not be as fast, but it
can certainly be fast enough.

And, if you only update once an hour, you may try other tricks ( like
copy to a temp table, truncate the original and insert the temp table
in the original, although I fear this will lock the table too, but it
will be a very short time, your readers may well tolerate it. )

Yours seem a special app with special need, try a few, measure, it is
certainly possible.

Francisco Olarte.

Re: [GENERAL] huge table occupation after updates

От

Francisco Olarte

Дата:

10 декабря 2016 г., 23:33:30

Tom:

On Sat, Dec 10, 2016 at 6:01 PM, Tom DalPozzo <t.dalpozzo@gmail.com> wrote:
> As for crash proof, I meant that once my client app is told that her update
> request was committed, it mustn't get lost (hdd failure apart of course).
> And I can't wait to flush the cache before telling to the app :"committed".
> I can replicate also the cache on the standby PC of course.

You are making inconsistent requests. When the server tells your app
it's commited, it has flush the transaction log cache. If your
assertion about is real, you cannot wait for commit, so your
requirements are imposible to satisfy ( of course, you could run with
scissors, but that will loose data without hdd failure ).

Francisco Olarte.

Re: [GENERAL] huge table occupation after updates

От

Tom DalPozzo

Дата:

11 декабря 2016 г., 00:02:33

2016-12-10 18:33 GMT+01:00 Francisco Olarte <folarte@peoplecall.com>:

Tom:

On Sat, Dec 10, 2016 at 6:01 PM, Tom DalPozzo <t.dalpozzo@gmail.com> wrote:
> As for crash proof, I meant that once my client app is told that her update
> request was committed, it mustn't get lost (hdd failure apart of course).
> And I can't wait to flush the cache before telling to the app :"committed".
> I can replicate also the cache on the standby PC of course.

You are making inconsistent requests. When the server tells your app
it's commited, it has flush the transaction log cache. If your
assertion about is real, you cannot wait for commit, so your
requirements are imposible to satisfy ( of course, you could run with
scissors, but that will loose data without hdd failure ).

Francisco Olarte.

Hi, perhaps I was not clear. The cache I mentioned is a possible cache in my app, outside postgresql server.

I answered to Rob's question with more details regarding my app.

Regards

Pupillo

Re: [GENERAL] huge table occupation after updates

От

Tom DalPozzo

Дата:

11 декабря 2016 г., 00:15:12

2016-12-10 18:30 GMT+01:00 Francisco Olarte <folarte@peoplecall.com>:

A couple of things first.

1.- This list encourages inline replying, editing the text, and frowns
upon top posting.

2.- Your HTML formatting with so a small size makes it harder for me (
and I can assume some others ) to properly read your messages.

If you want to discourage people replying to you, keep doing the two above.

On Sat, Dec 10, 2016 at 3:25 PM, Tom DalPozzo <t.dalpozzo@gmail.com> wrote:
> you're right, VACUUM FULL recovered the space, completely.

Well, it always does. ;-)

> So, at this point I'm worried about my needs.
> I cannot issue vacuum full as I read it locks the table.

Well, first hint of your needs. Bear in mind vacuum fulls can be very
fast on small tables ( i.e, if you have the 1.5Mb table, and do 2000
updates and then a vacuum full that will be very fast, time it ).

> In my DB, I (would) need to have a table with one bigint id field+ 10 bytea
> fields, 100 bytes long each (more or less, not fixed).
> 5/10000 rows maximum, but let's say 5000.
> As traffic I can suppose 10000 updates per row per day (spread over groups
> of hours; each update involving two of those fields, randomly.
> Also rows are chosen randomly (in my test I used a block of 2000 just to try
> one possibility).
> So, it's a total of 50 millions updates per day, hence (50millions * 100
> bytes *2 fields updated) 10Gbytes net per day.

Not at all. That's the volume of updated data, you must multiply by
the ROW size, not just the changed size, in your case 50M * 1100 ( to
have some wiggle room ), 55Gbytes.

But this is the UPPER BOUND you asked for. Not the real one.

> I'm afraid it's not possible, according to my results.

It certaninly is. You can set a very aggresive autovacuum schedule for
the table, or even better, you may vacuum AFTER each hourly update.
This will mark dead tuples for reuse. It will not be as fast, but it
can certainly be fast enough.

And, if you only update once an hour, you may try other tricks ( like
copy to a temp table, truncate the original and insert the temp table
in the original, although I fear this will lock the table too, but it
will be a very short time, your readers may well tolerate it. )

Yours seem a special app with special need, try a few, measure, it is
certainly possible.

Francisco Olarte.

Hi, I think you're right. I was surprised by the huge size of the tables in my tests but I had not considered the vacuum properly.

My test had a really huge activity so perhaps the autovacuum didn't have time to make the rows reusable.

Also, issuing plain VACUUM command does nothing visibile at once, but only after when, inserting new rows, the size doesn't increase.

I will try again as you suggest.

Thank you very much

Pupillo

Re: [GENERAL] huge table occupation after updates

От

Adrian Klaver

Дата:

11 декабря 2016 г., 00:20:29

On 12/10/2016 09:30 AM, Francisco Olarte wrote:
> A couple of things first.
>
> 1.- This list encourages inline replying, editing the text, and frowns
> upon top posting.
>
> 2.- Your HTML formatting with so a small size makes it harder for me (
> and I can assume some others ) to properly read your messages.

+1. I either had to Ctrl + or put the 'readers' on:)

>
> If you want to discourage people replying to you, keep doing the two above.
>
> On Sat, Dec 10, 2016 at 3:25 PM, Tom DalPozzo <t.dalpozzo@gmail.com> wrote:
>> you're right, VACUUM FULL  recovered the space, completely.
>
> Well, it always does. ;-)
>
>> So, at this point I'm worried about my needs.
>> I cannot issue vacuum full as I read it locks the table.
>
> Well, first hint of your needs. Bear in mind vacuum fulls can be very
> fast on small tables ( i.e, if you have the 1.5Mb table, and do 2000
> updates and then a vacuum full that will be very fast, time it ).
>
>> In my DB, I (would) need to have a table with one bigint id field+ 10 bytea
>> fields, 100 bytes long each (more or less, not fixed).
>> 5/10000 rows maximum, but let's say 5000.
>> As traffic I can suppose 10000 updates per row per day (spread over groups
>> of hours; each update involving two of those fields, randomly.
>> Also rows are chosen randomly (in my test I used a block of 2000 just to try
>> one possibility).
>> So, it's a total of 50 millions updates per day, hence (50millions * 100
>> bytes *2 fields updated) 10Gbytes net per day.
>
> Not at all. That's the volume of updated data, you must multiply by
> the ROW size, not just the changed size, in your case 50M * 1100 ( to
> have some wiggle room ), 55Gbytes.
>
> But this is the UPPER BOUND you asked for. Not the real one.
>
>> I'm afraid it's not possible, according to my results.
>
> It certaninly is. You can set a very aggresive autovacuum schedule for
> the table, or even better, you may vacuum AFTER each hourly update.
> This will mark dead tuples for reuse. It will not be as fast, but it
> can certainly be fast enough.
>
> And, if you only update once an hour, you may try other tricks ( like
> copy to a temp table, truncate the original and insert the temp table
> in the original, although I fear this will lock the table too, but it
> will be a very short time, your readers may well tolerate it. )
>
> Yours seem a special app with special need, try a few, measure, it is
> certainly possible.
>
> Francisco Olarte.
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com

Re: [GENERAL] huge table occupation after updates

От

Adrian Klaver

Дата:

11 декабря 2016 г., 00:24:22

On 12/10/2016 10:15 AM, Tom DalPozzo wrote:
>
>
> 2016-12-10 18:30 GMT+01:00 Francisco Olarte <folarte@peoplecall.com
> <mailto:folarte@peoplecall.com>>:
>
>     A couple of things first.
>
>     1.- This list encourages inline replying, editing the text, and frowns
>     upon top posting.
>
>     2.- Your HTML formatting with so a small size makes it harder for me (
>     and I can assume some others ) to properly read your messages.
>
>     If you want to discourage people replying to you, keep doing the two
>     above.
>
>     On Sat, Dec 10, 2016 at 3:25 PM, Tom DalPozzo <t.dalpozzo@gmail.com
>     <mailto:t.dalpozzo@gmail.com>> wrote:
>     > you're right, VACUUM FULL  recovered the space, completely.
>
>     Well, it always does. ;-)
>
>     > So, at this point I'm worried about my needs.
>     > I cannot issue vacuum full as I read it locks the table.
>
>     Well, first hint of your needs. Bear in mind vacuum fulls can be very
>     fast on small tables ( i.e, if you have the 1.5Mb table, and do 2000
>     updates and then a vacuum full that will be very fast, time it ).
>
>     > In my DB, I (would) need to have a table with one bigint id field+
>     10 bytea
>     > fields, 100 bytes long each (more or less, not fixed).
>     > 5/10000 rows maximum, but let's say 5000.
>     > As traffic I can suppose 10000 updates per row per day (spread
>     over groups
>     > of hours; each update involving two of those fields, randomly.
>     > Also rows are chosen randomly (in my test I used a block of 2000
>     just to try
>     > one possibility).
>     > So, it's a total of 50 millions updates per day, hence (50millions
>     * 100
>     > bytes *2 fields updated) 10Gbytes net per day.
>
>     Not at all. That's the volume of updated data, you must multiply by
>     the ROW size, not just the changed size, in your case 50M * 1100 ( to
>     have some wiggle room ), 55Gbytes.
>
>     But this is the UPPER BOUND you asked for. Not the real one.
>
>     > I'm afraid it's not possible, according to my results.
>
>     It certaninly is. You can set a very aggresive autovacuum schedule for
>     the table, or even better, you may vacuum AFTER each hourly update.
>     This will mark dead tuples for reuse. It will not be as fast, but it
>     can certainly be fast enough.
>
>     And, if you only update once an hour, you may try other tricks ( like
>     copy to a temp table, truncate the original and insert the temp table
>     in the original, although I fear this will lock the table too, but it
>     will be a very short time, your readers may well tolerate it. )
>
>     Yours seem a special app with special need, try a few, measure, it is
>     certainly possible.
>
>     Francisco Olarte.
>
>
> Hi, I think you're right. I was surprised by the huge size of the
> tables in my tests but I had not considered the vacuum properly.
> My test had a really huge activity so perhaps the autovacuum didn't have
> time to make the rows reusable.
> Also, issuing plain VACUUM command does nothing visibile at once, but
> only after when, inserting new rows, the size doesn't increase.
> I will try again as you suggest.

To make more sense of this I would suggest reading the following
sections of the manual:

https://www.postgresql.org/docs/9.5/static/routine-vacuuming.html

https://www.postgresql.org/docs/9.5/static/mvcc.html


There is a lot of ground covered in the above, more then can be digested
in one pass but it will help provide some context for the
answers/suggestions provided in this thread.

> Thank you very much
> Pupillo
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com

Re: [GENERAL] huge table occupation after updates

От

Alban Hertroys

Дата:

11 декабря 2016 г., 06:39:03

Please use a readable font. Your messages are using a font that's so small that my eyes start to hurt. I still try to
readthem, but I - and I assume others - will stop trying if you keep this up. 

Sorry for the top-post, but since it's not directly appropriate to the topic that's perhaps for the better.

> On 10 Dec 2016, at 19:15, Tom DalPozzo <t.dalpozzo@gmail.com> wrote:
>
>
>
> 2016-12-10 18:30 GMT+01:00 Francisco Olarte <folarte@peoplecall.com>:
> A couple of things first.
>
> 1.- This list encourages inline replying, editing the text, and frowns
> upon top posting.
>
> 2.- Your HTML formatting with so a small size makes it harder for me (
> and I can assume some others ) to properly read your messages.
>
> If you want to discourage people replying to you, keep doing the two above.
>
> On Sat, Dec 10, 2016 at 3:25 PM, Tom DalPozzo <t.dalpozzo@gmail.com> wrote:
> > you're right, VACUUM FULL  recovered the space, completely.
>
> Well, it always does. ;-)
>
> > So, at this point I'm worried about my needs.
> > I cannot issue vacuum full as I read it locks the table.
>
> Well, first hint of your needs. Bear in mind vacuum fulls can be very
> fast on small tables ( i.e, if you have the 1.5Mb table, and do 2000
> updates and then a vacuum full that will be very fast, time it ).
>
> > In my DB, I (would) need to have a table with one bigint id field+ 10 bytea
> > fields, 100 bytes long each (more or less, not fixed).
> > 5/10000 rows maximum, but let's say 5000.
> > As traffic I can suppose 10000 updates per row per day (spread over groups
> > of hours; each update involving two of those fields, randomly.
> > Also rows are chosen randomly (in my test I used a block of 2000 just to try
> > one possibility).
> > So, it's a total of 50 millions updates per day, hence (50millions * 100
> > bytes *2 fields updated) 10Gbytes net per day.
>
> Not at all. That's the volume of updated data, you must multiply by
> the ROW size, not just the changed size, in your case 50M * 1100 ( to
> have some wiggle room ), 55Gbytes.
>
> But this is the UPPER BOUND you asked for. Not the real one.
>
> > I'm afraid it's not possible, according to my results.
>
> It certaninly is. You can set a very aggresive autovacuum schedule for
> the table, or even better, you may vacuum AFTER each hourly update.
> This will mark dead tuples for reuse. It will not be as fast, but it
> can certainly be fast enough.
>
> And, if you only update once an hour, you may try other tricks ( like
> copy to a temp table, truncate the original and insert the temp table
> in the original, although I fear this will lock the table too, but it
> will be a very short time, your readers may well tolerate it. )
>
> Yours seem a special app with special need, try a few, measure, it is
> certainly possible.
>
> Francisco Olarte.
>
> Hi, I think you're right. I was surprised by the huge size of the tables in my tests but I had not considered the
vacuumproperly. 
> My test had a really huge activity so perhaps the autovacuum didn't have time to make the rows reusable.
> Also, issuing plain VACUUM command does nothing visibile at once, but only after when, inserting new rows, the size
doesn'tincrease. 
> I will try again as you suggest.
> Thank you very much
> Pupillo

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

Re: [GENERAL] huge table occupation after updates

От

"David G. Johnston"

Дата:

12 декабря 2016 г., 07:42:38

On Saturday, December 10, 2016, Tom DalPozzo <t.dalpozzo@gmail.com> wrote:

I have one direct DB client (let's name it MIDAPP) only. This client of the DB is a server for up to 10000 final clients.
Any time MIDAPP is going to reply to a client, it must save a "status record with some data" related to that client and only after that, answering /committing the final client.
The next time the same final client will ask something, the same status record will be updated again (with a different content).

Why do you want to pay for concurrency control when you don't seem to need it? While PostgreSQL likely can do what you need I suspect there are applications out there that can solve this specific problem better. Even something as simple as a flat file, one per "final client", written atomically and fsynced after each write/rename.

David J,

Re: [GENERAL] huge table occupation after updates

От

"t.dalpozzo@gmail.com"

Дата:

12 декабря 2016 г., 15:07:35

Il 12/12/2016 02:42, David G. Johnston ha scritto:

On Saturday, December 10, 2016, Tom DalPozzo <t.dalpozzo@gmail.com> wrote:

I have one direct DB client (let's name it MIDAPP) only. This client of the DB is a server for up to 10000 final clients.
Any time MIDAPP is going to reply to a client, it must save a "status record with some data" related to that client and only after that, answering /committing the final client.
The next time the same final client will ask something, the same status record will be updated again (with a different content).

Why do you want to pay for concurrency control when you don't seem to need it? While PostgreSQL likely can do what you need I suspect there are applications out there that can solve this specific problem better. Even something as simple as a flat file, one per "final client", written atomically and fsynced after each write/rename.

David J,

Hi David,
there are also other DB clients which only perform read queries using SQL. It's the reason why I chose postgreSQL over simpler apps. I didn't mention about them so far as those queries are not a concern in terms of performance.

Anyway, regarding the huge dimension of the table, I think that reason was that autovacuum didn't work as the updates traffic was really high in my test, with no pause. Infact, if I lower it down to 1500updates/sec, then autovacuum works (I checked the log).
So the table size can grow but not for ever as it gets reused.
Thank you very much.

Pupillo

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: [GENERAL] huge table occupation after updates

Вложения