Re: logical changeset generation v6

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: logical changeset generation v6
Дата
Msg-id CA+TgmobhDCHuckL_86wRDWJ31Gw3Y1HrQ4yUKEn7U1_hTbeVqQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: logical changeset generation v6  (Andres Freund <andres@2ndquadrant.com>)
Ответы Re: logical changeset generation v6  (Andres Freund <andres@2ndquadrant.com>)
Список pgsql-hackers
On Thu, Sep 19, 2013 at 10:43 AM, Andres Freund <andres@2ndquadrant.com> wrote:
>> - Looking specifically at the 0004 patch, I think that the
>> RecentGlobalDataXmin changes are logically separate from the rest of
>> the patch, and that if we're going to commit them at all, it should be
>> separate from the rest of this.  I think this is basically a
>> performance optimization.  AFAICS, there's no correctness problem with
>> the idea of continuing to maintain a single RecentGlobalXmin; it's
>> just that you'll potentially end up with quite a lot of bloat.  But
>> that's not an argument that those two things HAVE to be committed
>> together; either could be done first, and independently of the other.
>> Also, these changes are quite complex and it's different to form a
>> judgement as to whether this idea is safe when they are intermingled
>> with the rest of the logical replication stuff.
>
> Up until v3 the RecentGlobalDataXmin stuff wasn't included and reviewers
> (primarily Peter G. on -hackers and Greg Stark at pgconf.eu) remarked on
> that and considered it critical. I argued for a considerable amount of
> time that it shouldn't be done in an initial patch and then gave in.
>
> They have a point though, if you e.g. replicate a pgbench -j16 workload
> the addition of RecentGlobalDataXmin reduces the performance impact of
> replication from about 60% to less than 5% in my measurements. Turns out
> heap pruning is damn critical for that kind of workload.

No question.  I'm not saying that that optimization shouldn't go in
right after the main patch does, but IMHO right now there are too many
things going in the 0004 patch to discuss them all simultaneously.
I'd like to find a way of splitting this up that will let us
block-and-tackle individual pieces of it, even we end up committing
them all one right after the other.

But that raises an interesting question: why is the overhead so bad?
I mean, this shouldn't be any worse than having a series of
transactions running concurrently with pgbench that take a snapshot
and hold it for as long as it takes the decoding process to decode the
most-recently committed transaction.  Is the issue here that we can't
advance xmin until we've fsync'd the fruits of decoding down to disk?
If so, that's mighty painful.  But we'd really only need to hold back
xmin in that situation when some catalog change has occurred
meanwhile, which for pgbench means never.  So something seems fishy
here.

> I thought about the general case for a good bit and decided that all
> solutions that work in a more general scenario are complex enough that I
> don't want to implement them. And I don't really see any backward
> compatibility concerns here - removing the logic of using a separate
> horizon for user tables in contrast to system tables is pretty trivial
> and shouldn't have any external effect. Except pegging the horizon more,
> but that's what the new approach would fix, right?

Hmm, maybe.

>> Also, while I'm trying to keep this email focused on high-level
>> concerns, I have to say that guaranteedlyLogged has got to be one of
>> the worst variable names I've ever seen, starting (but not ending)
>> with the fact that guaranteedly is not a word.  I'm also tempted to
>> say that all of the wal_level=logical changes should be split out as
>> their own patch, separate from the decoding stuff.  Personally, I
>> would find that much easier to review, although I admit it's less
>> critical here than for the RecentGlobalDataXmin stuff.
>
> I can do that again and it actually was that way in the past. But
> there's no user for it before the later patch and it's hard to
> understand the reasoning for the changed wal logging separately, that's
> why I merged it at some point.

OK.  If I'm committing it, I'd prefer to handle that piece separately,
if possible.

>> - If XLOG_HEAP_INPLACE is not decoded, doesn't that mean that this
>> facility can't be used to replicate a system catalog table?  Is that a
>> restriction we should enforce/document somehow?
>
> Currently catalog tables aren't replicated, yes. They simply are skipped
> during decoding. XLOG_HEAP_INPLACE isn't the primary reason for that
> though.
>
> Do you see a usecase for it?

I can imagine someone wanting to do it, but I think we can live with
it not being supported.

>> - It still bothers me that we're going to have mandatory slots for
>> logical replication and no slots for physical replication.  Why are
>> slots mandatory in one case and not even allowable in the other?  Is
>> it sensible to think about slotless logical replication - or maybe I
>> should say, is it any LESS sensible than slotless physical
>> replication?
>
> Well, as you know, I do want to have slots for physical replication as
> well. But there actually is a fundamental difference why we need it for
> logical rep and not for physical: In physical replication, if the xmin
> progresses too far, client queries will be cancelled. Annoying but not
> fatal. In logical replication we will not be able to continue
> replicating since we cannot decode the WAL stream without a valid
> catalog snapshot. If xmin already has progressed too far the tuples
> won't be there anymore.
>
> If people think this needs to be a general facility from the start, I
> can be convinced that way, but I think there's so much to discuss around
> the semantics and different usecases that I'd much prefer to discuss
> that later.

I'm worried that if we don't know how the physical replication slots
are going to work, they'll end up being randomly different from the
logical replication slots, and that'll be an API wart which we'll have
a hard time getting rid of later.

>> - What is the purpose of (Un)SuspendDecodingSnapshots?  It seems that
>> should be explained somewhere.  I have my doubts about how safe that
>> is.
>
> I'll document the details if they aren't right now. Consider what
> happens if somebody does something like: "VACUUM FULL pg_am;". If we
> were to build the relation descriptor of pg_am in an "historical
> snapshot", as you coin it, we'd have the wrong filenode in there. And
> consequently any future lookups in pg_am will open a file that doesn't
> exist.
> That problem only exist for non-nailed relations that are accessed
> during decoding.

But if it's some user table flagged with the terribly-named
treat_as_catalog_table flag, then they could have not only changed the
relfilenode but also the tupledesc.  And then you can't just wave your
hands at the problem.

>>  And I definitely think that SetupDecodingSnapshots() is not OK.
>> Overwriting the satisfies functions in static pointers may be a great
>> way to make sure you've covered all bases during development, but I
>> can't see us wanting that ugliness in the official sources.
>
> Yes, I don't like it either. I am not sure what to replace it with
> though. It's easy enough to fit something in GetCatalogSnapshot() and I
> don't have a problem with that, but I am less happy with adding code
> like that to GetSnapshotData() for callers that use explicit snapshots.

I'm not sure exactly what a good solution would like, either.  I just
think this isn't it.  :-)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: Some interesting news about Linux 3.12 OOM
Следующее
От: Robert Haas
Дата:
Сообщение: Re: Some interesting news about Linux 3.12 OOM