Re: [GENERAL] Logcal replication in large scale

Поиск
Список
Период
Сортировка
От Magnus Hagander
Тема Re: [GENERAL] Logcal replication in large scale
Дата
Msg-id CABUevEzcyqRgNDoHLsempevA8Yj_43LuYnbf0mvWiXP00VDs8Q@mail.gmail.com
обсуждение исходный текст
Ответ на [GENERAL] Logcal replication in large scale  (Nicklas Avén <nicklas.aven@jordogskog.no>)
Ответы Re: [GENERAL] Logcal replication in large scale  (Nicklas Avén <nicklas.aven@jordogskog.no>)
Список pgsql-general
On Wed, Sep 20, 2017 at 8:53 AM, Nicklas Avén <nicklas.aven@jordogskog.no> wrote:
Hallo all

I am thrilled about logical replication in PostgreSQL 10. My head have
started spinning about use cases.

Would it be possible to use logical replication as a distribution
method of data?

As an answer to the generic  question: yes :)

 
I think about map data from national mapping authorities. The problem
is to get the updates of their data sets. Especially the open data sets
are now distributed as files (shape files) in Sweden and as pg_dump in
Norway.

I guess it is beyond what logical replication is designed for, so I ask
what problems that might arise for a scenario like:

The distributor has a publication database with logical replication
publications of the tables. All users, probably thousands or more,
would subscribe to that publication to get an updated copy of the data
set.

How would the publication server react? I guess the WAL-files will only
be written once anyway?

Yes. But they will  need to be kept around until *all* subscribers have pulled down their changes. So even one subscriber falling behind will mean your WAL will never get cleaned up.

Of course, you can keep some sort of watcher process that kills old replication slots.

I am also not sure how well PostgreSQL will react to having thousands of replication slots. It's not what the system was designed for I believe :)

You might be better of using logical decoding (which goes back to 9.4) to stream the data out, but not having each subscriber be a postgresql subscriber. Either using it to generate some sort of "diff files" that can then be consumed downstream, or by distributing it via some kind of dedicated queuing system designed to handle that many downstreams.

 
My guess is that it will be a lower payload than today anyway when the
whole data set have to be fetched to get updates.

As always that's going to depend on the number of updates. If the same thing is updated 1000 times, then if you use logical replication it gets transferred 1000 times. So there are definitely cases when there will be *more* traffic with logical replication, but in cases like yours I would guess it will be less.

//Magnus

В списке pgsql-general по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: [GENERAL] Logical decoding client has the power to crash the server
Следующее
От: Thomas Güttler
Дата:
Сообщение: Re: [GENERAL] Final pg_dumpall should happen in Single-User-Mode