Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance

Поиск
Список
Период
Сортировка
От Jeff Janes
Тема Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance
Дата
Msg-id CAMkU=1zSw-FYw0KKOZM=E1iTZRrt8QN778LehM8ZWVWJ0TTdeQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Linux kernel impact on PostgreSQL performance  (Claudio Freire <klaussfreire@gmail.com>)
Список pgsql-hackers
On Thursday, January 16, 2014, Dave Chinner <david@fromorbit.com> wrote:
On Thu, Jan 16, 2014 at 03:58:56PM -0800, Jeff Janes wrote:
> On Thu, Jan 16, 2014 at 3:23 PM, Dave Chinner <david@fromorbit.com> wrote:
>
> > On Wed, Jan 15, 2014 at 06:14:18PM -0600, Jim Nasby wrote:
> > > On 1/15/14, 12:00 AM, Claudio Freire wrote:
> > > >My completely unproven theory is that swapping is overwhelmed by
> > > >near-misses. Ie: a process touches a page, and before it's
> > > >actually swapped in, another process touches it too, blocking on
> > > >the other process' read. But the second process doesn't account
> > > >for that page when evaluating predictive models (ie: read-ahead),
> > > >so the next I/O by process 2 is unexpected to the kernel. Then
> > > >the same with 1. Etc... In essence, swap, by a fluke of its
> > > >implementation, fails utterly to predict the I/O pattern, and
> > > >results in far sub-optimal reads.
> > > >
> > > >Explicit I/O is free from that effect, all read calls are
> > > >accountable, and that makes a difference.
> > > >
> > > >Maybe, if the kernel could be fixed in that respect, you could
> > > >consider mmap'd files as a suitable form of temporary storage.
> > > >But that would depend on the success and availability of such a
> > > >fix/patch.
> > >
> > > Another option is to consider some of the more "radical" ideas in
> > > this thread, but only for temporary data. Our write sequencing and
> > > other needs are far less stringent for this stuff.  -- Jim C.
> >
> > I suspect that a lot of the temporary data issues can be solved by
> > using tmpfs for temporary files....
> >
>
> Temp files can collectively reach hundreds of gigs.

So unless you have terabytes of RAM you're going to have to write
them back to disk.

If they turn out to be hundreds of gigs, then yes they have to hit disk (at least on my hardware).  But if they are 10 gig, then maybe not (depending on whether other people decide to do similar things at the same time I'm going to be doing it--something which is often hard to predict).   But now for every action I take, I have to decide, is this going to take 10 gig, or 14 gig, and how absolutely certain am I?  And is someone else going to try something similar at the same time?  What a hassle.  It would be so much nicer to say "This is accessed sequentially, and will never be fsynced.  Maybe it will fit entirely in memory, maybe it won't, either way, you know what to do."  

If I start out writing to tmpfs, I can't very easily change my mind 94% of the way through and decide to go somewhere else.  But the kernel, effectively, can.
 
But there's something here that I'm not getting - you're talking
about a data set that you want ot keep cache resident that is at
least an order of magnitude larger than the cyclic 5-15 minute WAL
dataset that ongoing operations need to manage to avoid IO storms.

Those are mostly orthogonal issues.  The permanent files need to be fsynced on a regular basis, and might have gigabytes of data dirtied at random from within terabytes of underlying storage.  We better start writing that pretty quickly or when do issue the fsyncs, the world will fall apart.

The temporary files will never need to be fsynced, and can be written out sequentially if they do ever need to be written out.  Better to delay this as much as feasible.


Where do these temporary files fit into this picture, how fast do
they grow and why are do they need to be so large in comparison to
the ongoing modifications being made to the database?

The permanent files tend to be things like "Jane Doe just bought a pair of green shoes from Hendrick Green Shoes Limited--record that, charge her credit card, and schedule delivery".  The temp files are more like "It is the end of the year, how many shoes have been purchased in each color from each manufacturer for each quarter over the last 6 years"?   So the temp files quickly manipulate data that has slowly been accumulating over very long times, while the permanent files represent the processes of those accumulations.

If you are Amazon, of course, you have thousands of people who can keep two sets of records, one organized for fast update and one slightly delayed copy reorganized for fast analysis, and also do partial analysis on an ongoing basis and roll them up in ways that can be incrementally updated.  If you are not Amazon, it would be nice if one system did a better job of doing both things with the trade off between the two being dynamic and automatic.

Cheers,

Jeff

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Craig Ringer
Дата:
Сообщение: Re: WAL Rate Limiting
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: Heavily modified big table bloat even in auto vacuum is running