Re: Improving connection scalability: GetSnapshotData()

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: Improving connection scalability: GetSnapshotData()
Дата
Msg-id 20200331200438.nctcnrvpyca4oepv@alap3.anarazel.de
обсуждение исходный текст
Ответ на Re: Improving connection scalability: GetSnapshotData()  (Andres Freund <andres@anarazel.de>)
Ответы Re: Improving connection scalability: GetSnapshotData()  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
Hi,

I'm still fighting with snapshot_too_old. The feature is just badly
undertested, underdocumented, and there's lots of other oddities. I've
now spent about as much time on that feature than on the whole rest of
the patchset.

As an example for under-documented, here's a definitely non-trivial
block of code without a single comment explaining what it's doing.

                if (oldSnapshotControl->count_used > 0 &&
                    ts >= oldSnapshotControl->head_timestamp)
                {
                    int            offset;

                    offset = ((ts - oldSnapshotControl->head_timestamp)
                              / USECS_PER_MINUTE);
                    if (offset > oldSnapshotControl->count_used - 1)
                        offset = oldSnapshotControl->count_used - 1;
                    offset = (oldSnapshotControl->head_offset + offset)
                        % OLD_SNAPSHOT_TIME_MAP_ENTRIES;
                    xlimit = oldSnapshotControl->xid_by_minute[offset];

                    if (NormalTransactionIdFollows(xlimit, recentXmin))
                        SetOldSnapshotThresholdTimestamp(ts, xlimit);
                }

                LWLockRelease(OldSnapshotTimeMapLock);

Also, SetOldSnapshotThresholdTimestamp() acquires a separate spinlock -
not great to call that with the lwlock held.


Then there's this comment:

        /*
         * Failsafe protection against vacuuming work of active transaction.
         *
         * This is not an assertion because we avoid the spinlock for
         * performance, leaving open the possibility that xlimit could advance
         * and be more current; but it seems prudent to apply this limit.  It
         * might make pruning a tiny bit less aggressive than it could be, but
         * protects against data loss bugs.
         */
        if (TransactionIdIsNormal(latest_xmin)
            && TransactionIdPrecedes(latest_xmin, xlimit))
            xlimit = latest_xmin;

        if (NormalTransactionIdFollows(xlimit, recentXmin))
            return xlimit;

So this is not using lock, so the values aren't accurate, but it avoids
data loss bugs? I also don't know which spinlock is avoided on the path
here as mentioend - the acquisition is unconditional.

But more importantly - if this is about avoiding data loss bugs, how on
earth is it ok that we don't go through these checks in the
old_snapshot_threshold == 0 path?

        /*
         * Zero threshold always overrides to latest xmin, if valid.  Without
         * some heuristic it will find its own snapshot too old on, for
         * example, a simple UPDATE -- which would make it useless for most
         * testing, but there is no principled way to ensure that it doesn't
         * fail in this way.  Use a five-second delay to try to get useful
         * testing behavior, but this may need adjustment.
         */
        if (old_snapshot_threshold == 0)
        {
            if (TransactionIdPrecedes(latest_xmin, MyProc->xmin)
                && TransactionIdFollows(latest_xmin, xlimit))
                xlimit = latest_xmin;

            ts -= 5 * USECS_PER_SEC;
            SetOldSnapshotThresholdTimestamp(ts, xlimit);

            return xlimit;
        }


This feature looks like it was put together by applying force until
something gave, and then stopping just there.


Greetings,

Andres Freund



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alvaro Herrera
Дата:
Сообщение: Re: [HACKERS] Restricting maximum keep segments by repslots
Следующее
От: Justin Pryzby
Дата:
Сообщение: Re: pg_ls_tmpdir to show directories and shared filesets (andpg_ls_*)