Re: COPY locking

Поиск
Список
Период
Сортировка
От John Coers
Тема Re: COPY locking
Дата
Msg-id 3AFAA933.B01F403A@intrinsity.com
обсуждение исходный текст
Ответ на COPY locking  (John Coers <coers@intrinsity.com>)
Ответы Re: COPY locking  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-general
> No, they all share the same disk buffer in shared memory for whatever is
> currently the last page of the file.  Adding a tuple into that page
> requires grabbing a short-term lock on that page (since obviously only
> one process can do it at a time, else they'd be trying to insert into
> the same physical spot).  That lock is released as soon as the tuple is
> physically stored in the buffer, but with enough processes running COPY
> into the same table, you'll get contention for the lock.
Ah so!  So after each tuple is written to shared memory there is a giant
scramble for the lock.  Explains all the semops and timer sets and sleeps
in the truss output that I've been seeing.


> I suspect the performance issue you're seeing is not so much the need
> for a short-term lock (it's awful hard to see how to do without one,
> anyway) as it is that our current spinlock implementation is pretty bad
> in the face of heavy contention.  On most platforms the backoff when
> you can't get the lock on the first try is a ten-millisecond sleep,
> which is huge overkill for locks that are only held for periods of
> microseconds.  There was discussion in pghackers a few months ago about
> reimplementing spinlocks in some more modern way (eg, using Posix
> semaphores if available) but no progress has been made yet.

I saw that thread while looking for answers to my problem.  Put me in as a
"Yay" vote.  My use of this system is different from most I think.  I need to cram
huge amounts of data in from multiple clients, then I'll pull it all out and
process it later although there will be an occasional query to monitor progress.
I don't need fancy queries.


> > Are there any suggested techniques or tweaks I can make to avoid this
> > interference?
>
> Do you really need to run multiple COPYs in parallel, or would
> serializing them be just as good?  You could serialize them without
> locking out readers by doing
>
>         BEGIN;
>         LOCK TABLE foo IN EXCLUSIVE MODE;
>         COPY foo FROM ...
>         END;
>
> "EXCLUSIVE" mode isn't quite as exclusive as a plain LOCK TABLE; it
> still allows other readers.  See "Table-level locks" in the User's
> Guide.
>

They don't have to be parallel in the strictest sense.  As clients get
finished with jobs they will try to connect and upload data.  If serializing
the process makes it faster, then that's what I'll try.  Counterintuitive though
it may be.  Of course getting my best performance with N=2 was counterintuitive too...



--
John Coers            Intrinsity, Inc.
coers@intrinsity.com  Austin, Texas

В списке pgsql-general по дате отправления:

Предыдущее
От: Justin Clift
Дата:
Сообщение: Re: ER diagrams
Следующее
От: Tom Lane
Дата:
Сообщение: Re: GRANT