Обсуждение: postgres_fdw insert extremely slow

Поиск
Список
Период
Сортировка

postgres_fdw insert extremely slow

От
Mats Julian Olsen
Дата:
Helle pgsql-general,

We have just set up postgres_fdw between two postgres databases, x and y, with the plan to periodically insert data from x into y.

We've successfully set up the connection with a few options: `use_remote_estimate 'true'` and `fetch_size '50000'` (the latter we've played around with). We've run ANALYZE on the foreign server.

SELECTs against the foreign table returns in milliseconds, however an INSERT of 100 rows takes 10 seconds. we have roughly 200 000 rows for the initial sync, which translates into ~6 hours.

Is this the expected performance of postgre_fdw? Is there anything we've overlooked when setting this up? Very curious to hear experiences from the community when doing read/write and not just read from foreign sources.

Best regards,

--
Mats
CTO @ Dune Analytics

Re: postgres_fdw insert extremely slow

От
Adrian Klaver
Дата:
On 11/25/20 8:37 AM, Mats Julian Olsen wrote:
> Helle pgsql-general,
> 
> We have just set up postgres_fdw between two postgres databases, x and 
> y, with the plan to periodically insert data from x into y.
> 
> We've successfully set up the connection with a few options: 
> `use_remote_estimate 'true'` and `fetch_size '50000'` (the latter we've 
> played around with). We've run ANALYZE on the foreign server.
> 
> SELECTs against the foreign table returns in milliseconds, however an 
> INSERT of 100 rows takes 10 seconds. we have roughly 200 000 rows for 
> the initial sync, which translates into ~6 hours.

Postgres version(s)?

Without the actual query and EXPLAIN ANALYZE on it this will be 
difficult to answer with any detail.

Also would be nice to have the table schema.

> 
> Is this the expected performance of postgre_fdw? Is there anything we've 
> overlooked when setting this up? Very curious to hear experiences from 
> the community when doing read/write and not just read from foreign sources.
> 
> Best regards,
> 
> -- 
> Mats
> CTO @ Dune Analytics
> We're hiring: https://careers.duneanalytics.com 
> <https://careers.duneanalytics.com>


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: postgres_fdw insert extremely slow

От
Mats Julian Olsen
Дата:
Apologies for the sloppiness!
> Postgres version(s)?

x: Aurora PostgreSQL 11.7 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 
4.9.3, 64-bit (RDS)

y: PostgreSQL 12.2 (Ubuntu 12.2-2.pgdg19.10+1) on x86_64-pc-linux-gnu, 
compiled by gcc (Ubuntu 9.2.1-9ubuntu2) 9.2.1 20191008, 64-bit (GCP)

>
> Without the actual query and EXPLAIN ANALYZE on it this will be 
> difficult to answer with any detail.
>
> Also would be nice to have the table schema.
>

Here's the query, schema and the EXPLAIN ANALYZE

query: https://gist.github.com/mewwts/704663e19a21bc51864f3298f09a18c0

plan: https://explain.depesz.com/s/RQFQ


Best,

-- 
Mats
CTO @ Dune Analytics
We're hiring: https://careers.duneanalytics.com




Re: postgres_fdw insert extremely slow

От
Adrian Klaver
Дата:
On 11/25/20 8:48 AM, Mats Julian Olsen wrote:
> Apologies for the sloppiness!
>> Postgres version(s)?
> 
> x: Aurora PostgreSQL 11.7 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 
> 4.9.3, 64-bit (RDS)
> 
> y: PostgreSQL 12.2 (Ubuntu 12.2-2.pgdg19.10+1) on x86_64-pc-linux-gnu, 
> compiled by gcc (Ubuntu 9.2.1-9ubuntu2) 9.2.1 20191008, 64-bit (GCP)

So they are separated by what network distance?

> 
>>
>> Without the actual query and EXPLAIN ANALYZE on it this will be 
>> difficult to answer with any detail.
>>
>> Also would be nice to have the table schema.
>>
> 
> Here's the query, schema and the EXPLAIN ANALYZE

FYI, you will get more responses by including below information directly 
in the email. To that end:

On x

CREATE TABLE labels ( 

     id integer NOT NULL, 

     address_id bytea NOT NULL, 

     name text NOT NULL, 

     author character varying(50) NOT NULL, 

     type text NOT NULL, 

     source text, 

     updated_at timestamp with time zone DEFAULT now() NOT NULL, 

     CONSTRAINT lowercase_name CHECK ((name = lower(name))), 

     CONSTRAINT lowercase_type CHECK ((type = lower(type))), 

     CONSTRAINT whitespace_name CHECK ((regexp_replace(btrim(name, ' 
'::text), '(\s+)'::text, ' '::text, 'g'::text) = name)),
     CONSTRAINT whitespace_type CHECK ((regexp_replace(btrim(type, ' 
'::text), '(\s+)'::text, ' '::text, 'g'::text) = type))
); 

CREATE INDEX labels_label ON labels USING BTREE (name);
CREATE INDEX labels_label_type ON labels USING BTREE (type);
CREATE INDEX labels_address ON labels USING HASH (address_id);
CREATE INDEX labels_source ON labels USING BTREE (source);

On y

CREATE TABLE labels.labels (
     address     bytea PRIMARY KEY,
     labels      text[]
);

> 
> query: https://gist.github.com/mewwts/704663e19a21bc51864f3298f09a18c0

INSERT INTO foreign.labels (address, labels)
SELECT address_id, ARRAY_AGG(name) AS labels
FROM labels
GROUP BY 1
LIMIT 100;

> 
> plan: https://explain.depesz.com/s/RQFQ

Insert on labels  (cost=0.42..26.19 rows=100 width=53) (actual 
time=11541.205..11541.205 rows=0 loops=1)
    ->  Limit  (cost=0.42..25.19 rows=100 width=53) (actual 
time=0.068..3.549 rows=100 loops=1)
          ->  GroupAggregate  (cost=0.42..26502.02 rows=106996 width=53) 
(actual time=0.066..3.449 rows=100 loops=1)
                Group Key: labels_1.address_id
                ->  Index Only Scan using 
labels_address_id_type_name_key on labels labels_1  (cost=0.42..24068.85 
rows=219145 width=31) (actual time=0.054..0.414 rows=201 loops=1)
                      Heap Fetches: 0
  Planning Time: 0.102 ms
  Execution Time: 12797.143 ms


> 
> 
> Best,
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: postgres_fdw insert extremely slow

От
Tom Lane
Дата:
Mats Julian Olsen <mats@duneanalytics.com> writes:
>> Postgres version(s)?

> x: Aurora PostgreSQL 11.7 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 
> 4.9.3, 64-bit (RDS)
> y: PostgreSQL 12.2 (Ubuntu 12.2-2.pgdg19.10+1) on x86_64-pc-linux-gnu, 
> compiled by gcc (Ubuntu 9.2.1-9ubuntu2) 9.2.1 20191008, 64-bit (GCP)

Hmm, I wonder if Aurora could be doing something funny here?

Anyway, to answer your question, no that's not the expected level of
performance.  postgres_fdw is certainly not inexpensive compared to
local table access, but I'd still think inserts should take only a
small number of milliseconds.  It seems like something must be
blocking the query.  Have you tried looking into pg_locks on the
remote server while this query is running?

            regards, tom lane



Re: postgres_fdw insert extremely slow

От
Mats Julian Olsen
Дата:
On 25.11.2020 17:57, Adrian Klaver wrote:
> On 11/25/20 8:48 AM, Mats Julian Olsen wrote:
>> Apologies for the sloppiness!
>>> Postgres version(s)?
>>
>> x: Aurora PostgreSQL 11.7 on x86_64-pc-linux-gnu, compiled by gcc 
>> (GCC) 4.9.3, 64-bit (RDS)
>>
>> y: PostgreSQL 12.2 (Ubuntu 12.2-2.pgdg19.10+1) on 
>> x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.2.1-9ubuntu2) 9.2.1 
>> 20191008, 64-bit (GCP)
>
> So they are separated by what network distance?
>
By quite a lot! One server is in eu-west-1 and the other in gcp 
us-central1-a. Note however, that SELECTs across the network are very 
fast, so it doesn't seem like this is a network issue.
>>
>>>
>>> Without the actual query and EXPLAIN ANALYZE on it this will be 
>>> difficult to answer with any detail.
>>>
>>> Also would be nice to have the table schema.
>>>
>>
>> Here's the query, schema and the EXPLAIN ANALYZE
>
> FYI, you will get more responses by including below information 
> directly in the email. To that end:
>
> On x
>
> CREATE TABLE labels (
>     id integer NOT NULL,
>     address_id bytea NOT NULL,
>     name text NOT NULL,
>     author character varying(50) NOT NULL,
>     type text NOT NULL,
>     source text,
>     updated_at timestamp with time zone DEFAULT now() NOT NULL,
>     CONSTRAINT lowercase_name CHECK ((name = lower(name))),
>     CONSTRAINT lowercase_type CHECK ((type = lower(type))),
>     CONSTRAINT whitespace_name CHECK ((regexp_replace(btrim(name, ' 
> '::text), '(\s+)'::text, ' '::text, 'g'::text) = name)),
>     CONSTRAINT whitespace_type CHECK ((regexp_replace(btrim(type, ' 
> '::text), '(\s+)'::text, ' '::text, 'g'::text) = type))
> );
> CREATE INDEX labels_label ON labels USING BTREE (name);
> CREATE INDEX labels_label_type ON labels USING BTREE (type);
> CREATE INDEX labels_address ON labels USING HASH (address_id);
> CREATE INDEX labels_source ON labels USING BTREE (source);
>
> On y
>
> CREATE TABLE labels.labels (
>     address     bytea PRIMARY KEY,
>     labels      text[]
> );
>
>>
>> query: https://gist.github.com/mewwts/704663e19a21bc51864f3298f09a18c0
>
> INSERT INTO foreign.labels (address, labels)
> SELECT address_id, ARRAY_AGG(name) AS labels
> FROM labels
> GROUP BY 1
> LIMIT 100;
>
>>
>> plan: https://explain.depesz.com/s/RQFQ
>
> Insert on labels  (cost=0.42..26.19 rows=100 width=53) (actual 
> time=11541.205..11541.205 rows=0 loops=1)
>    ->  Limit  (cost=0.42..25.19 rows=100 width=53) (actual 
> time=0.068..3.549 rows=100 loops=1)
>          ->  GroupAggregate  (cost=0.42..26502.02 rows=106996 
> width=53) (actual time=0.066..3.449 rows=100 loops=1)
>                Group Key: labels_1.address_id
>                ->  Index Only Scan using 
> labels_address_id_type_name_key on labels labels_1 
> (cost=0.42..24068.85 rows=219145 width=31) (actual time=0.054..0.414 
> rows=201 loops=1)
>                      Heap Fetches: 0
>  Planning Time: 0.102 ms
>  Execution Time: 12797.143 ms
>
>
>>
>>
>> Best,
Thank you for inlining this!

-- 
Mats
CTO @ Dune Analytics
We're hiring: https://careers.duneanalytics.com




Re: postgres_fdw insert extremely slow

От
Mats Julian Olsen
Дата:
On 25.11.2020 17:58, Tom Lane wrote:
> Mats Julian Olsen <mats@duneanalytics.com> writes:
>>> Postgres version(s)?
>> x: Aurora PostgreSQL 11.7 on x86_64-pc-linux-gnu, compiled by gcc (GCC)
>> 4.9.3, 64-bit (RDS)
>> y: PostgreSQL 12.2 (Ubuntu 12.2-2.pgdg19.10+1) on x86_64-pc-linux-gnu,
>> compiled by gcc (Ubuntu 9.2.1-9ubuntu2) 9.2.1 20191008, 64-bit (GCP)
> Hmm, I wonder if Aurora could be doing something funny here?
>
> Anyway, to answer your question, no that's not the expected level of
> performance.  postgres_fdw is certainly not inexpensive compared to
> local table access, but I'd still think inserts should take only a
> small number of milliseconds.  It seems like something must be
> blocking the query.  Have you tried looking into pg_locks on the
> remote server while this query is running?
>
>             regards, tom lane
Thanks Tom, I'll try to spin up a regular Postgres instance on both rds 
and ec2 and see if that helps. As for the locks, I can not see any 
blocked activity on the remote server while the query runs.

-- 
Mats
CTO @ Dune Analytics
We're hiring: https://careers.duneanalytics.com




Re: postgres_fdw insert extremely slow

От
Tom Lane
Дата:
Mats Julian Olsen <mats@duneanalytics.com> writes:
> On 25.11.2020 17:58, Tom Lane wrote:
>> ... Have you tried looking into pg_locks on the
>> remote server while this query is running?

> Thanks Tom, I'll try to spin up a regular Postgres instance on both rds 
> and ec2 and see if that helps. As for the locks, I can not see any 
> blocked activity on the remote server while the query runs.

Another place to check is pg_stat_activity, specifically wait_event_type
and wait_event.

            regards, tom lane



Re: postgres_fdw insert extremely slow

От
Mats Julian Olsen
Дата:
On 25.11.2020 18:12, Tom Lane wrote:
> Mats Julian Olsen <mats@duneanalytics.com> writes:
>> On 25.11.2020 17:58, Tom Lane wrote:
>>> ... Have you tried looking into pg_locks on the
>>> remote server while this query is running?
>> Thanks Tom, I'll try to spin up a regular Postgres instance on both rds
>> and ec2 and see if that helps. As for the locks, I can not see any
>> blocked activity on the remote server while the query runs.
> Another place to check is pg_stat_activity, specifically wait_event_type
> and wait_event.

Thanks Tom, I've now setup the fdw from a local postgres instance and 
seeing similar timings for the selects and inserts.

I've got some more numbers here:

x (aurora aws eu-west-1) => y  (gcp gce us-central-1):  ~15 s

local (eu) => y: ~15 s

local (eu) => test 1 (postgres aws rds eu-west-1): ~4 s

local (eu) => test 2 (postgres google cloud sql, us-central-1): ~15s

local (eu) => local (eu): < 1s

SELECTs in all instances are sub-second.


To me this does indicate some sort of networking issue, but I'm 
wondering if INSERTs are treated differently than SELECTs in 
postgres_fdw? The only feasibly explanation I have is that postgres_fdw 
does many more network calls for INSERT than for SELECT, e.g. something 
like 1 for SELECT and `n` for INSERT?

Best,


Mats




Re: postgres_fdw insert extremely slow

От
Tom Lane
Дата:
Mats Julian Olsen <mats@duneanalytics.com> writes:
> I've got some more numbers here:
> ...
> To me this does indicate some sort of networking issue, but I'm 
> wondering if INSERTs are treated differently than SELECTs in 
> postgres_fdw? The only feasibly explanation I have is that postgres_fdw 
> does many more network calls for INSERT than for SELECT, e.g. something 
> like 1 for SELECT and `n` for INSERT?

I don't have the code in front of me, but from memory, postgres_fdw
will issue an INSERT statement to the remote for each row it has to
insert.  Maybe you are indeed just dealing with spectacularly bad
network round trip times.

You could try turning on log_statement and/or log_duration on the
remote to see if that sheds any more light about how much time is
spent executing each insertion vs. the network delay.

There's been some recent discussion about teaching postgres_fdw to
batch insertions, which would likely be helpful in your situation.
I don't know how close that is to committable, but in any case
it couldn't see the light of day earlier than v14.  In the meantime,
if you're sufficiently desperate maybe you could switch to using
dblink with manually-issued multi-row INSERTs.  (This needn't
preclude continuing to access the table with postgres_fdw when
that's suitable.)

            regards, tom lane



Re: postgres_fdw insert extremely slow

От
Adrian Klaver
Дата:
On 11/25/20 11:18 AM, Tom Lane wrote:
> Mats Julian Olsen <mats@duneanalytics.com> writes:
>> I've got some more numbers here:
>> ...
>> To me this does indicate some sort of networking issue, but I'm
>> wondering if INSERTs are treated differently than SELECTs in
>> postgres_fdw? The only feasibly explanation I have is that postgres_fdw
>> does many more network calls for INSERT than for SELECT, e.g. something
>> like 1 for SELECT and `n` for INSERT?
> 
> I don't have the code in front of me, but from memory, postgres_fdw
> will issue an INSERT statement to the remote for each row it has to
> insert.  Maybe you are indeed just dealing with spectacularly bad
> network round trip times.
> 
> You could try turning on log_statement and/or log_duration on the
> remote to see if that sheds any more light about how much time is
> spent executing each insertion vs. the network delay.
> 
> There's been some recent discussion about teaching postgres_fdw to
> batch insertions, which would likely be helpful in your situation.
> I don't know how close that is to committable, but in any case
> it couldn't see the light of day earlier than v14.  In the meantime,
> if you're sufficiently desperate maybe you could switch to using
> dblink with manually-issued multi-row INSERTs.  (This needn't
> preclude continuing to access the table with postgres_fdw when
> that's suitable.)

So even if Mats where to break this query:

INSERT INTO foreign.labels (address, labels)
SELECT address_id, ARRAY_AGG(name) AS labels
FROM labels
GROUP BY 1
LIMIT 100;

down into something like this:

INSERT INTO foreign.labels (address, labels)
VALUES (), (), (), ();

postgres_fdw would send it as individual INSERTs?

> 
>             regards, tom lane
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: postgres_fdw insert extremely slow

От
Laurenz Albe
Дата:
On Thu, 2020-11-26 at 09:07 -0800, Adrian Klaver wrote:
> So even if Mats where to break this query:
> 
> INSERT INTO foreign.labels (address, labels)
> SELECT address_id, ARRAY_AGG(name) AS labels
> FROM labels
> GROUP BY 1
> LIMIT 100;
> 
> down into something like this:
> 
> INSERT INTO foreign.labels (address, labels)
> VALUES (), (), (), ();
> 
> postgres_fdw would send it as individual INSERTs?

Yes, that's the way the FDW API works.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com




Re: postgres_fdw insert extremely slow

От
pabloa98
Дата:


On Thu, Nov 26, 2020 at 8:25 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Thu, 2020-11-26 at 09:07 -0800, Adrian Klaver wrote:
> So even if Mats where to break this query:
>
> INSERT INTO foreign.labels (address, labels)
> SELECT address_id, ARRAY_AGG(name) AS labels
> FROM labels
> GROUP BY 1
> LIMIT 100;
>
> down into something like this:
>
> INSERT INTO foreign.labels (address, labels)
> VALUES (), (), (), ();
>
> postgres_fdw would send it as individual INSERTs?

Yes, that's the way the FDW API works.

I can confirm that it is NOT like FDW API work. I have the same problem and in the fdw database it receives:

INSERT INTO foreign.labels (address, labels) VALUES ();

One value at a time.

Same scenario, same performance. No network related. It is just a batch insert is split into 1 insert by row.


Pablo

Re: postgres_fdw insert extremely slow

От
Mats Julian Olsen
Дата:
On 27.11.2020 10:11, pabloa98 wrote:


On Thu, Nov 26, 2020 at 8:25 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Thu, 2020-11-26 at 09:07 -0800, Adrian Klaver wrote:
> So even if Mats where to break this query:
>
> INSERT INTO foreign.labels (address, labels)
> SELECT address_id, ARRAY_AGG(name) AS labels
> FROM labels
> GROUP BY 1
> LIMIT 100;
>
> down into something like this:
>
> INSERT INTO foreign.labels (address, labels)
> VALUES (), (), (), ();
>
> postgres_fdw would send it as individual INSERTs?

Yes, that's the way the FDW API works.

I can confirm that it is NOT like FDW API work. I have the same problem and in the fdw database it receives:

INSERT INTO foreign.labels (address, labels) VALUES ();

One value at a time.

Same scenario, same performance. No network related. It is just a batch insert is split into 1 insert by row.


Pablo

Thank you all for your replies.

We've decided not to use postgres_fdw for the time being because of this behavior. Hope to revisit in the future.

Best,

-- 
Mats
CTO @ Dune Analytics
We're hiring: https://careers.duneanalytics.com

Re: postgres_fdw insert extremely slow

От
pabloa98
Дата:
I would like to suggest for postgres_fdw: If the foreign database is PostgreSQL, the link should just pass through all the CRUD SQL commands to the other database.

If the other database is of a version so different that cannot make sense of the CRUD SQL command, it will generate an error and that's it.

This would be very useful to keep datasets synchronized.

Pablo

Re: postgres_fdw insert extremely slow

От
"David G. Johnston"
Дата:
On Fri, Nov 27, 2020 at 2:00 PM pabloa98 <pabloa98@gmail.com> wrote:
I would like to suggest for postgres_fdw: If the foreign database is PostgreSQL,

Just to be clear, the "postgres" part of the name means the remote database must be a PostgreSQL database, there is no "if".  Likewise, for the extension mysql_fdw the remote database is "MySQL".
 
the link should just pass through all the CRUD SQL commands to the other database.

If the other database is of a version so different that cannot make sense of the CRUD SQL command, it will generate an error and that's it.

This would be very useful to keep datasets synchronized.


We already offer a tool for that, dblink.


But the generalized behavior of FDW doesn't work to just send a raw SQL command across, even for CRUD.
e.g.,
UPDATE remote_tbl
SET ...
FROM local_tbl
WHERE remote_tbl.col_id=local_tbl.col_id;
or
INSERT INTO remote_tbl SELECT * FROM local_tbl;

Not saying that there isn't room for improvement here but I'm doubting it's simple.

David J.

Re: postgres_fdw insert extremely slow

От
Craig Ringer
Дата:


On Thu, Nov 26, 2020 at 12:37 AM Mats Julian Olsen <mats@duneanalytics.com> wrote:

We have just set up postgres_fdw between two postgres databases, x and y, with the plan to periodically insert data from x into y.

We've successfully set up the connection with a few options: `use_remote_estimate 'true'` and `fetch_size '50000'` (the latter we've played around with). We've run ANALYZE on the foreign server.

SELECTs against the foreign table returns in milliseconds, however an INSERT of 100 rows takes 10 seconds. we have roughly 200 000 rows for the initial sync, which translates into ~6 hours.

Is this the expected performance of postgre_fdw? Is there anything we've overlooked when setting this up? Very curious to hear experiences from the community when doing read/write and not just read from foreign sources.

Are your inserts run in individual transactions or grouped into one transaction? If the latter, commit time will be a factor.

What's the round-trip time (ping time) to the foreign server? Since postgres_fdw runs each individual insert as a separate statement, you're going to face insert times of (n * RTT) for inserts. Assuming negligible time for insert execution on the foreign server, your runtime is 21600 seconds for 200000 rows, i.e. 9.25 rows/second or 0.108 seconds/row. That would be consistent with a 90-100ms ping time to the foreign server.

You'll be pleased to know that there is currently work ongoing in pgsql-hackers to add the capability to batch INSERTs in postgres_fdw to improve performance on higher latency links. See https://www.postgresql.org/message-id/flat/20200628151002.7x5laxwpgvkyiu3q%40development . That could well reduce your RTTs immensely. Try the patch out if you can and report back please.

If you can get the client application to manage the foreign insert directly, then handle commit consistency using two-phase commit, you should be able to do the insert in half an hour or less instead (assuming ~10ms execution time per insert and 90ms RTT). If you use `COPY`, or if you can use JDBC to benefit from PgJDBC's support for the JDBC addBatch() and executeBatch() APIs, you should be able to get it down way lower than that. Assuming your RTT latency is 90ms and you spend 10ms executing each insert, your insert time might well go down to 0.010 * 200000 + 90*2 = 2180 seconds or about 36 minutes. If you can insert a row in 3ms with COPY, 13 minutes.

There's work ongoing on making libpq (which underlies postgres_fdw) capable of running multiple statements at the same time, i.e. "pipelining". That won't immediately benefit postgres_fdw because using it in postgres_fdw would require changes to the whole postgres executor as well. But if adopted, it'd allow postgres_fdw to achieve that sort of performance transparently.