Re: [HACKERS] Page Scan Mode in Hash Index

Поиск

Список

Период

Сортировка

От	Robert Haas
Тема	Re: [HACKERS] Page Scan Mode in Hash Index
Дата	19 сентября 2017 г. 19:19:21
Msg-id	CA+Tgmoa+w73dnX1CiiZnY3pAKqsVvwYX=n3KMDj9uWf6NVC6Kg@mail.gmail.com обсуждение исходный текст
Ответ на	Re: [HACKERS] Page Scan Mode in Hash Index (Jesper Pedersen <jesper.pedersen@redhat.com>)
Ответы	Re: [HACKERS] Page Scan Mode in Hash Index Re: [HACKERS] Page Scan Mode in Hash Index
Список	pgsql-hackers

Дерево обсуждения

On Thu, Aug 24, 2017 at 11:26 AM, Jesper Pedersen
<jesper.pedersen@redhat.com> wrote:
> Based on the feedback in this thread, I have moved the patch to "Ready for
> Committer".

Reviewing 0001:

_hash_readpage gets the page LSN to see if we can apply LP_DEAD hints,
but if the table is unlogged or temporary, the LSN will never change,
so the test in _hash_kill_items will always think that it's OK to
apply the hints.  (This seems like it might be a pretty serious
problem, because I'm not sure what would be a viable workaround.)

The logic that tries to ensure that so->currPos.{buf,currPage,lsn} get
updated is not, to my eyes, obviously correct. Generally, the logic
for this stuff seems unnaturally spread out to me.  For example,
_hash_next() updates currPos.buf, but leaves it to _hash_readpage to
set currPage and lsn.  That function also sets all three fields when
it advances to the next page by calling _hash_readnext(), but when it
tries to advance to the next page and doesn't find one it sets buf but
not currPage or lsn.  It seems to me that this logic should be more
centralized somehow.  Can we arrange things so that at least buf,
currPage, and lsn, and maybe also nextPage and prevPage, get updated
at the same time and as soon after reading the buffer as possible?

It would be bad if a primary bucket page's hasho_prevblkno field got
copied into so->currPos.prevpage, because the value that appears for
the primary bucket is not a real block number.  But _hash_readpage
seems like it can bring about this exact situation, because of this
code:

+            if (!BlockNumberIsValid(opaque->hasho_nextblkno))
+                prev_blkno = opaque->hasho_prevblkno;
...
+                so->currPos.prevPage = prev_blkno;

If we're reading the primary bucket page and there are no overflow
pages, hasho_nextblkno will not be valid and hasho_prevblkno won't be
a real block number.

Incidentally, the "if" statement in the above block of code is
probably not saving anything; I would suggest for clarity that you do
the assignment unconditionally (but this is just a matter of style, so
I don't feel super-strongly about it).

+    return (so->currPos.firstItem <= so->currPos.lastItem);

Can this ever return false?  It looks to me like we should never reach
this code unless that condition holds, and that it would be a bug if
we did.  If that's correct, maybe this should be an Assert() and the
return statement should just return true.

+        buf = _hash_getbuf(rel, blkno, HASH_READ, LH_OVERFLOW_PAGE);
+
+        /* It might not exist anymore; in which case we can't hint it. */
+        if (!BufferIsValid(buf))
+            return;

This is dead code, because _hash_getbuf always returns a valid buffer.
If there's actually a risk of the buffer disappearing, then some other
handling is needed for this case.  But I suspect that, since a scan
always holds a pin on the primary bucket, there's actually no way for
this to happen and this is just dead code.

The comment in hashgetbitmap claims that _hash_first() or _hash_next()
never returns dead tuples.  If that were true, it would be a bug,
because then scans started during recovery would return wrong answers.
A more accurate statement would be something like: _hash_first and
_hash_next handle eliminate dead index entries whenever
scan->ignored_killed_tuples is true.  Therefore, there's nothing to do
here except add the results to the TIDBitmap.

_hash_readpage contains unnecessary "continue" statements inside the
loops.  The reason that they're unnecessary is that there's no code
below that in the loop anyway, so the loop is already going to go back
around to the top. Whether to change this is a matter of style, so I
won't complain too much if you want to leave it the way it is, but
personally I'd favor removing them.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: [HACKERS] Page Scan Mode in Hash Index