Re: unlogged tables
От | Andres Freund |
---|---|
Тема | Re: unlogged tables |
Дата | |
Msg-id | 201011172135.06272.andres@anarazel.de обсуждение исходный текст |
Ответ на | Re: unlogged tables (Robert Haas <robertmhaas@gmail.com>) |
Ответы |
Re: unlogged tables
|
Список | pgsql-hackers |
On Wednesday 17 November 2010 20:54:14 Robert Haas wrote: > On Wed, Nov 17, 2010 at 2:31 PM, Andrew Dunstan <andrew@dunslane.net> wrote: > >>> The customer is always right, and I think we are hearing loud and clear > >>> what the customers want. Please let's not go out of our way to create > >>> a feature that isn't what they want. > >> > >> I would be fine with only having a safe shutdown with unlogged tables > >> and skip the checkpoint I/O all other times. > > > > Yeah, I was just thinking something like that would be good, and should > > overcome Robert's objection to the whole idea. > > Could we slow down here a bit and talk through the ideas here in a > logical fashion? > > The customer is always right, but the informed customer makes better > decisions than the uninformed customer. This idea, as proposed, does > not work. If you only include dirty buffers at the final checkpoint > before shutting down, you have no guarantee that any buffers that you > either didn't write or didn't fsync previously are actually on disk. > Therefore, you have no guarantee that the table data is not corrupted. > So you really have to decide between including the unlogged-table > buffers in EVERY checkpoint and not ever including them at all. Which > one is right depends on your use case. How can you get a buffer which was no written out *at all*? Do you want to force all such pages to stay in shared_buffers? That sounds quite a bit more complicated than what you proposed... > For example, consider the poster who said that, when this feature is > available, they plan to try ripping out their memcached instance and > replacing it with PostgreSQL running unlogged tables. Suppose this > poster (or someone else in a similar situation) has a 64 GB and is > currently running a 60 GB memcached instance on it, which is not an > unrealistic scenario for memcached. Suppose further that he dirties > 25% of that data each hour. memcached is currently doing no writes to > disk. When he switches to PostgreSQL and sets checkpoints_segments to > a gazillion and checkpoint_timeout to the maximum, he's going to start > writing 15 GB of data to disk every hour - data which he clearly > doesn't care about losing, or preserving across restarts, because he's > currently storing it in memcached. In fact, with memcached, he'll not > only lose data at shutdown - he'll lose data on a regular basis when > everything is running normally. We can try to convince ourselves that > someone in this situation will not care about needing to get 15GB of > disposable data per hour from memory to disk in order to have a > feature that he doesn't need, but I think it's going to be pretty hard > to make that credible. To really support that use case we would first need to make shared_buffers properly scale to 64GB - which unfortunatley, in my experience, is not yet the case. Also, see the issues in the former paragraph - I have severe doubts you can support such a memcached scenario by pg. Either you spill to disk if your buffers overflow (fine with me) or you need to throw away data memcached alike. I doubt there is a sensible implementation in pg for the latter. So you will have to write to disk at some point... > Third use case. Someone on pgsql-general mentioned that they want to > write logs to PG, and can abide losing them if a crash happens, but > not on a clean shutdown and restart. This person clearly shuts down > their production database a lot more often than I do, but that is OK. > By explicit stipulation, they want the survive-a-clean-shutdown > behavior. I have no problem supporting that use case, providing they > are willing to take the associated performance penalty at checkpoint > time, which we don't know because we haven't asked, but I'm fine with > assuming it's useful even though I probably wouldn't use it much > myself. Maybe I am missing something - but why does this imply we have to write data at checkpoints? Just fsyncing every file belonging to an persistently-unlogged (or whatever sensible name anyone can come up) table is not prohibively expensive - in fact doing that on a local $PGDATA with approx 300GB and loads of tables doing so takes less than 15s on a system with hot inode/dentry cache and no dirty files. (just `find $PGDATA -print0|xargs -0 fsync_many_files` with fsync_many_files beeing a tiny c program doing posix_fadvise(POSIX_FADV_DONTNEED) on all files and then fsyncs every one). The assumption of a hot inode cache is realistic I think. Andres
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Steve SingerДата:
Сообщение: Re: Review: rollback sequence reset for TRUNCATE ... RESTART IDENTITY