Обсуждение: [PERFORM] Sort-of replication for reporting purposes

Поиск
Список
Период
Сортировка

[PERFORM] Sort-of replication for reporting purposes

От
Ivan Voras
Дата:
Hello,

I'm investigating options for an environment which has about a dozen servers and several dozen databases on each, and they occasionally need to run huge reports which slow down other services. This is of course "legacy code". After some discussion, the idea is to offload these reports to separate servers - and that would be fairly straightforward if not for the fact that the report code creates temp tables which are not allowed on read-only hot standby replicas.

So, the next best thing would be to fiddle with the storage system and make lightweight snapshots of live database clusters (their storage volumes) and mount them on the reporting servers when needed for the reports. This is a bit messy :-)

I'm basically fishing for ideas. Are there any other options available which would offer fast replication-like behaviour ?

If not, what practices would minimise problems with the storage snapshots idea? Any filesystem options?

Re: [PERFORM] Sort-of replication for reporting purposes

От
Scott Marlowe
Дата:
On Fri, Jan 6, 2017 at 12:24 PM, Ivan Voras <ivoras@gmail.com> wrote:
> Hello,
>
> I'm investigating options for an environment which has about a dozen servers
> and several dozen databases on each, and they occasionally need to run huge
> reports which slow down other services. This is of course "legacy code".
> After some discussion, the idea is to offload these reports to separate
> servers - and that would be fairly straightforward if not for the fact that
> the report code creates temp tables which are not allowed on read-only hot
> standby replicas.
>
> So, the next best thing would be to fiddle with the storage system and make
> lightweight snapshots of live database clusters (their storage volumes) and
> mount them on the reporting servers when needed for the reports. This is a
> bit messy :-)
>
> I'm basically fishing for ideas. Are there any other options available which
> would offer fast replication-like behaviour ?
>
> If not, what practices would minimise problems with the storage snapshots
> idea? Any filesystem options?

I've always solved this with slony replication, but pg_basebackup
should be pretty good for making sort of up to date slave copies. Just
toss a recovery.conf file and touch whatever failover file the slave
expects etc.


Re: [PERFORM] Sort-of replication for reporting purposes

От
Ivan Voras
Дата:


On 6 Jan 2017 8:30 p.m., "Scott Marlowe" <scott.marlowe@gmail.com> wrote:
On Fri, Jan 6, 2017 at 12:24 PM, Ivan Voras <ivoras@gmail.com> wrote:
> Hello,
>
> I'm investigating options for an environment which has about a dozen servers
> and several dozen databases on each, and they occasionally need to run huge
> reports which slow down other services. This is of course "legacy code".
> After some discussion, the idea is to offload these reports to separate
> servers - and that would be fairly straightforward if not for the fact that
> the report code creates temp tables which are not allowed on read-only hot
> standby replicas.
>
> So, the next best thing would be to fiddle with the storage system and make
> lightweight snapshots of live database clusters (their storage volumes) and
> mount them on the reporting servers when needed for the reports. This is a
> bit messy :-)
>
> I'm basically fishing for ideas. Are there any other options available which
> would offer fast replication-like behaviour ?
>
> If not, what practices would minimise problems with the storage snapshots
> idea? Any filesystem options?

I've always solved this with slony replication, but pg_basebackup
should be pretty good for making sort of up to date slave copies. Just
toss a recovery.conf file and touch whatever failover file the slave
expects etc.

I forgot to add one more information, the databases are 50G+ each so doing the base backup on demand over the network is not a great option.

Re: [PERFORM] Sort-of replication for reporting purposes

От
Rick Otten
Дата:
I suggest SymmetricDS.  ( http://symmetricds.org )

I've had good luck using them to aggregate data from a heterogeneous suite of database systems and versions back to a single back-end data mart for exactly this purpose.



On Fri, Jan 6, 2017 at 2:24 PM, Ivan Voras <ivoras@gmail.com> wrote:
Hello,

I'm investigating options for an environment which has about a dozen servers and several dozen databases on each, and they occasionally need to run huge reports which slow down other services. This is of course "legacy code". After some discussion, the idea is to offload these reports to separate servers - and that would be fairly straightforward if not for the fact that the report code creates temp tables which are not allowed on read-only hot standby replicas.

So, the next best thing would be to fiddle with the storage system and make lightweight snapshots of live database clusters (their storage volumes) and mount them on the reporting servers when needed for the reports. This is a bit messy :-)

I'm basically fishing for ideas. Are there any other options available which would offer fast replication-like behaviour ?

If not, what practices would minimise problems with the storage snapshots idea? Any filesystem options?


Re: [PERFORM] Sort-of replication for reporting purposes

От
Stephen Frost
Дата:
Ivan,

* Ivan Voras (ivoras@gmail.com) wrote:
> I'm investigating options for an environment which has about a dozen
> servers and several dozen databases on each, and they occasionally need to
> run huge reports which slow down other services. This is of course "legacy
> code". After some discussion, the idea is to offload these reports to
> separate servers - and that would be fairly straightforward if not for the
> fact that the report code creates temp tables which are not allowed on
> read-only hot standby replicas.

You could create a new server which has postgres_fdw connections to your
read-only replicas and run the reporting code there.  That could suck,
of course, since the data would have to be pulled across to be
aggregated (assuming that's what your reporting script is doing).

If you can't change the reporting script at all, that might be what you
have to do though.  Be sure to look at the postgres_fdw options about
batch size and how planning is done.

If you can change the reporting script, another option is to create FDWs
on your primary servers with FDW tables that point to some other server
and then have the reporting script use the FDW tables as the temp or
destination tables on the replica.  The magic here is that FDW tables on
a read-only replica *can* be written to, but you have to create the FDW
and the FDW tables on the primary and let them be replicated.

As also mentioned, you could use trigger-based replication (eg: bucardo,
slony, etc) instead of block-based, or you could look at the logical
replication capabilities (pg_logical) to see about using that for your
replica-for-reporting instead.

Thanks!

Stephen

Вложения

Re: [PERFORM] Sort-of replication for reporting purposes

От
ProPAAS DBA
Дата:

On 01/06/2017 12:24 PM, Ivan Voras wrote:
> Hello,
>
> I'm investigating options for an environment which has about a dozen
> servers and several dozen databases on each, and they occasionally
> need to run huge reports which slow down other services. This is of
> course "legacy code". After some discussion, the idea is to offload
> these reports to separate servers - and that would be fairly
> straightforward if not for the fact that the report code creates temp
> tables which are not allowed on read-only hot standby replicas.
>
> So, the next best thing would be to fiddle with the storage system and
> make lightweight snapshots of live database clusters (their storage
> volumes) and mount them on the reporting servers when needed for the
> reports. This is a bit messy :-)
>
> I'm basically fishing for ideas. Are there any other options available
> which would offer fast replication-like behaviour ?
>
> If not, what practices would minimise problems with the storage
> snapshots idea? Any filesystem options?
>

You could have a look at SLONY - it locks the replicated tables into
read only but the standby cluster remains read/write. As an added bonus
you could replicate everything into a single reporting database cluster,
in separate schema's there are lots and lots of features with SLONY that
give you flexibility.

http://slony.info/

I can't speak from direct experience but I think pg_logical may offer
similar features





Re: [PERFORM] Sort-of replication for reporting purposes

От
Stuart Bishop
Дата:


On 7 January 2017 at 02:33, Ivan Voras <ivoras@gmail.com> wrote:


On 6 Jan 2017 8:30 p.m., "Scott Marlowe" <scott.marlowe@gmail.com> wrote:
On Fri, Jan 6, 2017 at 12:24 PM, Ivan Voras <ivoras@gmail.com> wrote:
> Hello,
>
> I'm investigating options for an environment which has about a dozen servers
> and several dozen databases on each, and they occasionally need to run huge
> reports which slow down other services. This is of course "legacy code".
> After some discussion, the idea is to offload these reports to separate
> servers - and that would be fairly straightforward if not for the fact that
> the report code creates temp tables which are not allowed on read-only hot
> standby replicas.
>
> So, the next best thing would be to fiddle with the storage system and make
> lightweight snapshots of live database clusters (their storage volumes) and
> mount them on the reporting servers when needed for the reports. This is a
> bit messy :-)
>
> I'm basically fishing for ideas. Are there any other options available which
> would offer fast replication-like behaviour ?
>
> If not, what practices would minimise problems with the storage snapshots
> idea? Any filesystem options?

I've always solved this with slony replication, but pg_basebackup
should be pretty good for making sort of up to date slave copies. Just
toss a recovery.conf file and touch whatever failover file the slave
expects etc.

I forgot to add one more information, the databases are 50G+ each so doing the base backup on demand over the network is not a great option.

If you don't want to rebuild your report databases, you can use PostgreSQL built in replication to keep them in sync. Just promote the replica to a primary, run your reports, then wind it back to a standby and let it catch up. pg_rewind might be able to wind it back, or you could use a filesystem snapshot from before you promoted the replica to a primary. You do need to ensure that the real primary keep enough WAL logs to cover the period your report database is broken out.

Personally though, I'd take the opportunity to set up wal shipping and point in time recovery on your primary, and rebuild your reporting database regularly from these backups. You get your fresh reporting database on demand without overloading the primary, and regularly test your backups.



--

Re: [PERFORM] Sort-of replication for reporting purposes

От
Ivan Voras
Дата:
On 13 January 2017 at 12:00, Stuart Bishop <stuart@stuartbishop.net> wrote:


On 7 January 2017 at 02:33, Ivan Voras <ivoras@gmail.com> wrote:



I forgot to add one more information, the databases are 50G+ each so doing the base backup on demand over the network is not a great option.

If you don't want to rebuild your report databases, you can use PostgreSQL built in replication to keep them in sync. Just promote the replica to a primary, run your reports, then wind it back to a standby and let it catch up.


Ah, that's a nice option, didn't know about pg_rewind! I need to read about it some more...
So far, it seems like the best one.

 
Personally though, I'd take the opportunity to set up wal shipping and point in time recovery on your primary, and rebuild your reporting database regularly from these backups. You get your fresh reporting database on demand without overloading the primary, and regularly test your backups.

I don't think that would solve the main problem. If I set up WAL shipping, then the secondary server will periodically need to ingest the logs, right? And then I'm either back to running it for a while and rewinding it, as you've said, or basically restoring it from scratch every time which will be slower than just doing a base backup, right?



Re: [PERFORM] Sort-of replication for reporting purposes

От
"Phillip Couto"
Дата:
Why not utilize the pglogical plugin from 2ndQuadrant? They demonstrate your use case on the webpage for it and it is free. 

Phillip Couto
From: ivoras@gmail.com
Sent: January 13, 2017 06:20
To: stuart@stuartbishop.net
Cc: scott.marlowe@gmail.com; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Sort-of replication for reporting purposes

On 13 January 2017 at 12:00, Stuart Bishop <stuart@stuartbishop.net> wrote:


On 7 January 2017 at 02:33, Ivan Voras <ivoras@gmail.com> wrote:



I forgot to add one more information, the databases are 50G+ each so doing the base backup on demand over the network is not a great option.

If you don't want to rebuild your report databases, you can use PostgreSQL built in replication to keep them in sync. Just promote the replica to a primary, run your reports, then wind it back to a standby and let it catch up.


Ah, that's a nice option, didn't know about pg_rewind! I need to read about it some more...
So far, it seems like the best one.

 
Personally though, I'd take the opportunity to set up wal shipping and point in time recovery on your primary, and rebuild your reporting database regularly from these backups. You get your fresh reporting database on demand without overloading the primary, and regularly test your backups.

I don't think that would solve the main problem. If I set up WAL shipping, then the secondary server will periodically need to ingest the logs, right? And then I'm either back to running it for a while and rewinding it, as you've said, or basically restoring it from scratch every time which will be slower than just doing a base backup, right?



Re: [PERFORM] Sort-of replication for reporting purposes

От
Stuart Bishop
Дата:
On 13 January 2017 at 18:17, Ivan Voras <ivoras@gmail.com> wrote:
On 13 January 2017 at 12:00, Stuart Bishop <stuart@stuartbishop.net> wrote:


On 7 January 2017 at 02:33, Ivan Voras <ivoras@gmail.com> wrote:



I forgot to add one more information, the databases are 50G+ each so doing the base backup on demand over the network is not a great option.

If you don't want to rebuild your report databases, you can use PostgreSQL built in replication to keep them in sync. Just promote the replica to a primary, run your reports, then wind it back to a standby and let it catch up.


Ah, that's a nice option, didn't know about pg_rewind! I need to read about it some more...
So far, it seems like the best one.

 
Personally though, I'd take the opportunity to set up wal shipping and point in time recovery on your primary, and rebuild your reporting database regularly from these backups. You get your fresh reporting database on demand without overloading the primary, and regularly test your backups.

I don't think that would solve the main problem. If I set up WAL shipping, then the secondary server will periodically need to ingest the logs, right? And then I'm either back to running it for a while and rewinding it, as you've said, or basically restoring it from scratch every time which will be slower than just doing a base backup, right?

It is solving a different problem (reliable, tested backups). As a side effect, you end up with a copy of your main database that you can run reports on. I'm suggesting that maybe the slow restoration of the database is not actually a problem, but instead that you can use it to your advantage. Maybe this fits into your bigger picture. Or maybe having a dozen hot standbys of your existing dozen servers is a better option for you.

--