Duplicating data folder without tablespace, for read access

Поиск
Список
Период
Сортировка
От Jack Cushman
Тема Duplicating data folder without tablespace, for read access
Дата
Msg-id CAEv_OHUyxVEL+yt89dLYV5gGPxAkSSNCr=Afxg6S9RuYnags8g@mail.gmail.com
обсуждение исходный текст
Ответы Re: Duplicating data folder without tablespace, for read access  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Duplicating data folder without tablespace, for read access  (Stephen Frost <sfrost@snowman.net>)
Re: Duplicating data folder without tablespace, for read access  ("Peter J. Holzer" <hjp-pgsql@hjp.at>)
Re: Duplicating data folder without tablespace, for read access  (Jack Cushman <jcushman@gmail.com>)
Список pgsql-general
Hi --

I'm wondering whether, in my specific situation, it would be safe to copy a database cluster's data folder, and bring up the copy for read access, without copying a tablespace linked from it. My situation (described below) involves a database with a 100GB table and a 600GB table where I want to routinely clone just the 100GB table for web access.

---

For context, the last discussion I've found is from 2013, in this blog post from Christophe Pettus and response from Tom Lane:


In that discussion, Christophe summarized the situation this way:

> I would not count on it.  And if it works 100% reliably now, it might not on a future version of PostgreSQL.

> As Josh Berkus pointed out to my off-list, there are two competing definitions of the term "recover" in use here:

> 1. In my blog post, the definition of "recover" was "bring up the database without having unusually extensive knowledge of PostgreSQL's internals."
> 2. For Tom, the definition of "recover" is "bring up the database if you have appropriate knowledge of PostgreSQL's internals."

> You can't recover from the lost of a tablespace per definition #1.  You can per definition #2.

> I'd strongly suggest that relying on definition #2, while absolutely correct, is a poor operational decision for most users.


---

Now here's the situation where I want to do what Christophe said not to do: :)

I have a large database of text, with a 600GB table and a 100GB table connected by a join table. They both see occasional updates throughout the week. Once a week I want to "cut a release," meaning I will clone just the 100GB table and copy it to a "release" server for read-only web access.

My procedure would be:

- keep the 600GB table on a separate tablespace
- cleanly stop postgres on both servers
- copy the data folder to the release server
- delete pg_tblspc/* on the release server
- start postgres on both servers

In local testing this seems to work -- the release server works fine, and I only get an error message if I try to access the missing tables, which is expected. But are there reasons this is going to bite me if I try it in production? I'm hoping it helps that (a) I'm only doing read access, (b) I can cleanly stop both servers before cutting a release, and (c) I'm not worried about losing data, since it's just an access copy.

Alternatives I've considered:

- I could pg_dump and restore, but the 100GB table has lots of indexes and I'd rather not have to reindex on the release server each week.
- I could replicate with pglogical and use some sort of blue-green setup on the release server to cut a release, but this adds a lot of moving parts, especially to deal with schema migrations.

Thanks for any advice you might have!

-Jack

В списке pgsql-general по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Best Practices for Extensions, limitations and recommended use for monitoring
Следующее
От: Tom Lane
Дата:
Сообщение: Re: pg_basebackup failed to read a file