Обсуждение: Having some problems with concurrent COPY commands

Поиск
Список
Период
Сортировка

Having some problems with concurrent COPY commands

От
Shaun Thomas
Дата:
Hi guys,

I've been doing some design investigation and ran into an interesting snag I didn't expect to find on 9.4 (and earlier). I wrote a quick python script to fork multiple simultaneous COPY commands to several separate tables and found that performance apparently degrades based on how many COPY commands are running.

For instance, in the logs with one COPY, I see about one second to import 100k rows. At two processes, it's 2 seconds. At four processes, it's 4 seconds. This is for each process. Thus loading 400k rows takes 16 seconds cumulatively. To me, it looked like some kind of locking issue, but pg_locks showed no waits during the load. In trying to figure this out, I ran across this discussion:


Which came after this:


It would appear I'm running into whatever issue the xloginsert_slots patch tried to address, but not much discussion exists afterwards. It's like the patch just kinda vanished into the ether even though it (apparently) massively improves PG's ability to scale data import.

I should note that setting wal_level to minimal, or doing the load on unlogged tables completely resolves this issue. However, those are not acceptable settings in a production environment. Is there any other way to get normal parallel COPY performance, or is that just currently impossible?

I also know 9.5 underwent a lot of locking improvements, so it might not be relevant. I just haven't gotten a chance to repeat my tests with 9.5 just yet.

--

Re: Having some problems with concurrent COPY commands

От
Andres Freund
Дата:
Hi,

On 2015-10-12 13:17:53 -0500, Shaun Thomas wrote:
> It would appear I'm running into whatever issue the xloginsert_slots patch
> tried to address, but not much discussion exists afterwards.

That patch is merged, it's just that the number of slots is
hardcoded. You can recompile postgres with different values by changing
#define NUM_XLOGINSERT_LOCKS  8
in xlog.c to a different value. A restart is enough afterwards.

> Is there any other way to
> get normal parallel COPY performance, or is that just currently impossible?
>
> I also know 9.5 underwent a lot of locking improvements, so it might
> not be relevant. I just haven't gotten a chance to repeat my tests
> with 9.5 just yet.

Hard to say anything substantive without further information. Any chance
you could provide profiles of such a run? If yes, I can help you with
instructions. I'm just to lazy to write them up if not.

Greetings,

Andres Freund


Re: Having some problems with concurrent COPY commands

От
Jeff Janes
Дата:
On Mon, Oct 12, 2015 at 11:17 AM, Shaun Thomas <bonesmoses@gmail.com> wrote:
Hi guys,

I've been doing some design investigation and ran into an interesting snag I didn't expect to find on 9.4 (and earlier). I wrote a quick python script to fork multiple simultaneous COPY commands to several separate tables and found that performance apparently degrades based on how many COPY commands are running.

For instance, in the logs with one COPY, I see about one second to import 100k rows. At two processes, it's 2 seconds. At four processes, it's 4 seconds. This is for each process. Thus loading 400k rows takes 16 seconds cumulatively. To me, it looked like some kind of locking issue, but pg_locks showed no waits during the load. In trying to figure this out, I ran across this discussion:


Which came after this:


It would appear I'm running into whatever issue the xloginsert_slots patch tried to address, but not much discussion exists afterwards. It's like the patch just kinda vanished into the ether even though it (apparently) massively improves PG's ability to scale data import.

I should note that setting wal_level to minimal, or doing the load on unlogged tables completely resolves this issue. However, those are not acceptable settings in a production environment. Is there any other way to get normal parallel COPY performance, or is that just currently impossible?

I also know 9.5 underwent a lot of locking improvements, so it might not be relevant. I just haven't gotten a chance to repeat my tests with 9.5 just yet.


Can you provide the test script?  Also, have you tuned your database for high io throughput?  What is your storage system like?

Re: Having some problems with concurrent COPY commands

От
Shaun Thomas
Дата:
On Mon, Oct 12, 2015 at 1:28 PM, Andres Freund <andres@anarazel.de> wrote:

> Any chance
> you could provide profiles of such a run?

This is as simple as I could make it reliably. With one copy running,
the thread finishes in about 1 second. With 2, it's 1.5s each, and
with all 4, it's a little over 3s for each according to the logs. I
have log_min_duration_statement set to 1000, so it's pretty obvious.
The scary part is that it's not even scaling linearly; performance is
actually getting *worse* with each subsequent thread.

Regarding performance, all of this fits in memory. The tables are only
100k rows with the COPY statement. The machine itself is 8 CPUs with
32GB of RAM, so it's not an issue of hardware. So far as I can tell,
it happens on every version I've tested on, from 9.2 to 9.4. I also
take back what I said about wal_level. Setting it to minimal does
nothing. Disabling archive_mode and setting max_wal_senders to 0 also
does nothing. With 4 concurrent processes, each takes 3 seconds, for a
total of 12 seconds to import 400k rows when it would take 4 seconds
to do sequentially. Sketchy.

COPY (
  SELECT id, id % 100, id % 1000, now() - (id || 's')::INTERVAL
    FROM generate_series(1, 100000) a(id)
) TO '/tmp/loadtest1.csv';

CREATE TABLE test_copy (
  id            SERIAL PRIMARY KEY,
  location      VARCHAR NOT NULL,
  reading       BIGINT NOT NULL,
  reading_date  TIMESTAMP NOT NULL
);

CREATE INDEX idx_test_copy_location ON test_copy (location);
CREATE INDEX idx_test_copy_date ON test_copy (reading_date);

CREATE TABLE test_copy2 (LIKE test_copy INCLUDING INDEXES);
CREATE TABLE test_copy3 (LIKE test_copy INCLUDING INDEXES);
CREATE TABLE test_copy4 (LIKE test_copy INCLUDING INDEXES);

psql -c "COPY test_copy FROM '/tmp/loadtest1.csv'" &>/dev/null &
psql -c "COPY test_copy2 FROM '/tmp/loadtest1.csv'" &>/dev/null &
psql -c "COPY test_copy3 FROM '/tmp/loadtest1.csv'" &>/dev/null &
psql -c "COPY test_copy4 FROM '/tmp/loadtest1.csv'" &>/dev/null &


--
Shaun Thomas
bonesmoses@gmail.com
http://bonesmoses.org/


Re: Having some problems with concurrent COPY commands

От
Heikki Linnakangas
Дата:
On 10/12/2015 11:14 PM, Shaun Thomas wrote:
> On Mon, Oct 12, 2015 at 1:28 PM, Andres Freund <andres@anarazel.de> wrote:
>
>> Any chance
>> you could provide profiles of such a run?
>
> This is as simple as I could make it reliably. With one copy running,
> the thread finishes in about 1 second. With 2, it's 1.5s each, and
> with all 4, it's a little over 3s for each according to the logs. I
> have log_min_duration_statement set to 1000, so it's pretty obvious.
> The scary part is that it's not even scaling linearly; performance is
> actually getting *worse* with each subsequent thread.
>
> Regarding performance, all of this fits in memory. The tables are only
> 100k rows with the COPY statement. The machine itself is 8 CPUs with
> 32GB of RAM, so it's not an issue of hardware. So far as I can tell,
> it happens on every version I've tested on, from 9.2 to 9.4. I also
> take back what I said about wal_level. Setting it to minimal does
> nothing. Disabling archive_mode and setting max_wal_senders to 0 also
> does nothing. With 4 concurrent processes, each takes 3 seconds, for a
> total of 12 seconds to import 400k rows when it would take 4 seconds
> to do sequentially. Sketchy.

I was not able reproduce that behaviour on my laptop. I bumped the
number of rows in your script 100000, to make it run a bit longer.
Attached is the script I used. The total wallclock time the COPYs takes
on 9.4 is about 8 seconds for a single COPY, and 12 seconds for 4
concurrent COPYs. So it's not scaling as well as you might hope, but
it's certainly not worse-than-serial either, as you you're seeing.

If you're seeing this on 9.2 and 9.4 alike, this can't be related to the
XLogInsert scaling patch, although you might've found a case where that
patch didn't help where it should've. I ran "perf" to profile the test
case, and it looks like about 80% of the CPU time is spent in the b-tree
comparison function. That doesn't leave much scope for XLogInsert
scalability to matter one way or another.

I have no explanation for what you're seeing though. A bad spinlock
implementation perhaps? Anything special about the hardware at all? Can
you profile it on your system? Which collation?

- Heikki

Вложения

Re: Having some problems with concurrent COPY commands

От
Shaun Thomas
Дата:
On Tue, Oct 13, 2015 at 2:32 AM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> 80% of the CPU time is spent in the b-tree comparison function.

In the logs, my duration per COPY command increases from about 1400ms
for one process to about 3800ms when I have four running concurrently.
That's really my only data point, unfortunately. Strace isn't super
helpful because it just says 70-ish% of the time is wait4, but that's
not significantly different than the results using one process.

Everything else is bog standard default. I bumped up
checkpoint_segments to avoid checkpoints during the test, and
increased work_mem and maintenance_work_mem to avoid disk affecting
results, and still got the same behavior.

I wasn't blaming the patch. :) I thought it didn't get merged or
something, and after learning that wasn't the case, my only theory
went out the window. I'm a bit stuck, because this seems incredibly
wrong. I'll keep digging.

--
Shaun Thomas
bonesmoses@gmail.com
http://bonesmoses.org/


Re: Having some problems with concurrent COPY commands

От
Andres Freund
Дата:
On 2015-10-13 07:14:01 -0700, Shaun Thomas wrote:
> On Tue, Oct 13, 2015 at 2:32 AM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> > 80% of the CPU time is spent in the b-tree comparison function.
>
> In the logs, my duration per COPY command increases from about 1400ms
> for one process to about 3800ms when I have four running concurrently.
> That's really my only data point, unfortunately. Strace isn't super
> helpful because it just says 70-ish% of the time is wait4, but that's
> not significantly different than the results using one process.

Please run a profile. Compile postgres with CFLAGS='-O2 -fno-omit-frame-pointer'
as an argument to configure. That'll allow us to get a hierarchical profile.

Then first do a plain cpu profile:
perf record -g -a sleep 5
perf report > somefile

Then let's look at lock contention:
perf record -g -a -e syscalls:sys_enter_semop sleep 5
perf report > somefile

and send the results.

Thanks,

Andres


Re: Having some problems with concurrent COPY commands

От
Shaun Thomas
Дата:
On Tue, Oct 13, 2015 at 7:23 AM, Andres Freund <andres@anarazel.de> wrote:
> and send the results.

Whelp, I'm an idiot. I can't account for how I did it, but I can only
assume I didn't export my ports in the tests properly. I ran
everything again and there's a marked difference between 9.3 and 9.4.
The parallel copy times still inflate, but only from 1.4s to 2.5s at 4
procs. Though it gets a bit dicey after that.

I tried to see what the growth curve looks like, but the numbers are
wildly inconsistent after 4 procs. Even at 6, it went anywhere from
4.3 to 7s for each COPY, even while no checkpoint is running. COPY
time definitely increases with each additional process though, which
is likely expected. I was hoping the lock improvements in 9.5 would
improve this area too, but performance is the same on 9.5 (yes I'm
sure this time).

I can still send the perfs, but I suspect they're not exceptionally
useful anymore. :)

As a side note, using INSERT instead scales almost exactly linearly.
This would be useful, except that INSERT is already at least a
magnitude slower than COPY. Hah.

--
Shaun Thomas
bonesmoses@gmail.com
http://bonesmoses.org/


Re: Having some problems with concurrent COPY commands

От
David Rowley
Дата:
On 14 October 2015 at 08:33, Shaun Thomas <bonesmoses@gmail.com> wrote:
On Tue, Oct 13, 2015 at 7:23 AM, Andres Freund <andres@anarazel.de> wrote:
> and send the results.

Whelp, I'm an idiot. I can't account for how I did it, but I can only
assume I didn't export my ports in the tests properly. I ran
everything again and there's a marked difference between 9.3 and 9.4.
The parallel copy times still inflate, but only from 1.4s to 2.5s at 4
procs. Though it gets a bit dicey after that.



Do the times still inflate in the same way if you perform the COPY before adding the indexes to the table?

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services