Обсуждение: Proposal for unlogged tables

Поиск
Список
Период
Сортировка

Proposal for unlogged tables

От
Mark Zealey
Дата:
I've recently been doing some performance testing with unlogged tables
vs logged tables on 9.5-rc1. Basically we're trying to do big loads of
data into the database periodically. If on the very rare occasion the
server crashes half way through the import it's no big deal so I've been
looking specifically at unlogged tables with transactions having
synchronous_commit set to OFF. When we do the inserts on a logged table
with default WAL configuration settings we get a *lot* of disk IO
generated (500mb/sec of pretty random IO - we have a nice ssd raid array
but even so this maxes it out). Tweaking WAL settings (commit_delay,
max_wal_size, min_wal_size) improves the situation quite a bit
(50-100mb/sec of writes), but still we have no need to log the inserts
into the WAL at the moment.

Doing the imports to unlogged tables we get virtually no IO until the
insert process has finished when the table gets flushed to disk which is
great for us. However I read in the manuals that if the server ever has
an unclean shutdown all unlogged tables will be truncated. Obviously
with 9.5 we can now alter tables to be logged/unlogged after insert but
this will still write all the inserts into the WAL. I can understand the
requirement to truncate tables with active IO at the point of unclean
shutdown where you may get corrupted data; but I'm interested to find
out how easy it would be to not perform the truncate for historical
unlogged tables. If the last data modification statement was run more
than eg 30 seconds or 1 minute before an unclean shutdown (or the data
was otherwise flushed to disk and there was no IO since then) can we not
assume that the data is not corrupted and hence not truncate the
unlogged tables?

Thanks

Mark


Re: Proposal for unlogged tables

От
Yves Dorfsman
Дата:
On 2016-01-04 02:59, Mark Zealey wrote:
> shutdown all unlogged tables will be truncated. Obviously with 9.5 we can now
> alter tables to be logged/unlogged after insert but this will still write all
> the inserts into the WAL.

I haven't tried, but won't converting an unlogged table into a logged table
write all the inserts at once instead of once per insert?

Or are you wanting to do more bulk insert into that table later?

> I can understand the requirement to truncate tables
> with active IO at the point of unclean shutdown where you may get corrupted
> data; but I'm interested to find out how easy it would be to not perform the
> truncate for historical unlogged tables.

Are you trying to avoid running a CHECKPOINT? Are you afraid the activity on
the other tables will create too much I/O?

> If the last data modification
> statement was run more than eg 30 seconds or 1 minute before an unclean
> shutdown (or the data was otherwise flushed to disk and there was no IO since
> then) can we not assume that the data is not corrupted and hence not truncate
> the unlogged tables?

I have to admit that I have been surprised by this, it feels like unlogged
tables are never written properly unless you do an explicit CHECKSUM.


--
http://yves.zioup.com
gpg: 4096R/32B0F416



Re: Proposal for unlogged tables

От
Mark Zealey
Дата:

On 04/01/16 16:27, Yves Dorfsman wrote:
> I haven't tried, but won't converting an unlogged table into a logged
> table write all the inserts at once instead of once per insert? Or are
> you wanting to do more bulk insert into that table later?
> Are you trying to avoid running a CHECKPOINT? Are you afraid the activity on
> the other tables will create too much I/O?

Setting a table to logged still pushes all the inserts into the WAL
which we don't need and causes a lot of extra IO. It also takes quite a
long time as it is basically rewriting the table and all indexes (eg 60
seconds for 2m rows on one of my test tables). We can do this but a) it
generates lots of additional IO which isn't really required for us, and
b) it acquires an exclusive lock on the table which is also not nice for us.

>> If the last data modification
>> statement was run more than eg 30 seconds or 1 minute before an unclean
>> shutdown (or the data was otherwise flushed to disk and there was no IO since
>> then) can we not assume that the data is not corrupted and hence not truncate
>> the unlogged tables?
> I have to admit that I have been surprised by this, it feels like unlogged
> tables are never written properly unless you do an explicit CHECKSUM.

I don't know how the internals work but unlogged tables definitely
flushed to disk and persist through normal server restarts. It is just
according to the docs if the server ever has an unclean shutdown the
tables are truncated even if they have not been updated in a year. I
can't understand why it has to be like this and it seems that it would
be much nicer to not automatically truncate if it doesn't have to. This
would be great in the situation where you can tolerate a low chance of
data-loss but want very quick upserts.

Mark


Re: Proposal for unlogged tables

От
Andres Freund
Дата:
On 2016-01-04 16:38:40 +0200, Mark Zealey wrote:
> I don't know how the internals work but unlogged tables definitely flushed
> to disk and persist through normal server restarts. It is just according to
> the docs if the server ever has an unclean shutdown the tables are truncated
> even if they have not been updated in a year. I can't understand why it has
> to be like this and it seems that it would be much nicer to not
> automatically truncate if it doesn't have to.

Pages containing data of unlogged tables aren't ever flushed to disk
unless
a) a shutdown checkpoint is performed
b) a buffer containing data from an unlogged table is used for something
   else
c) the database being copied is the the source of a CREATE DATABASE .. TEMPLATE

Hence, if there's an unclean shutdown, there's absolutely no guarantee
about the on-disk state of unlogged tables. Even if they haven't been
modified in ages - there could have been many many dirty pages in shared
buffers when crashing.


Always flushing dirty pages of unlogged tables at checkpoint would
greatly increase the overhead for memory resident, write heavy workloads
that use unlogged tables.

Andres


Re: Proposal for unlogged tables

От
Mark Zealey
Дата:

On 04/01/16 18:12, Andres Freund wrote:
> Pages containing data of unlogged tables aren't ever flushed to disk
> unless
> a) a shutdown checkpoint is performed
> b) a buffer containing data from an unlogged table is used for something
>     else
> c) the database being copied is the the source of a CREATE DATABASE .. TEMPLATE
>
> Hence, if there's an unclean shutdown, there's absolutely no guarantee
> about the on-disk state of unlogged tables. Even if they haven't been
> modified in ages - there could have been many many dirty pages in shared
> buffers when crashing.
>
>
> Always flushing dirty pages of unlogged tables at checkpoint would
> greatly increase the overhead for memory resident, write heavy workloads
> that use unlogged tables.

If there was a command to flush a specific unlogged table to disk it
would work around all these issues no? Perhaps if you marked the table
as read only at the same time it would flush it to disk and ensure no
more data could be written to it eg (ALTER TABLE ... SET READ ONLY on an
unlogged table would flush + not truncate after crash). In our case this
would be great as we want to use these as permanent tables for speed;
but after an initial data dump we don't change the data again so we
could just do this at the end of the import process.

Mark


Re: Proposal for unlogged tables

От
Andres Freund
Дата:
On 2016-01-04 19:12:22 +0200, Mark Zealey wrote:
> If there was a command to flush a specific unlogged table to disk it would
> work around all these issues no? Perhaps if you marked the table as read
> only at the same time it would flush it to disk and ensure no more data
> could be written to it eg (ALTER TABLE ... SET READ ONLY on an unlogged
> table would flush + not truncate after crash). In our case this would be
> great as we want to use these as permanent tables for speed; but after an
> initial data dump we don't change the data again so we could just do this at
> the end of the import process.

It's more complex than that, even unmodified tables need to be processed
by vacuum every now and then (xid wraparound handling). It probably
possible to work around such things, but it's not a line or ten.

Andres