Обсуждение: Column lookup in a row performance

Поиск
Список
Период
Сортировка

Column lookup in a row performance

От
Павлухин Иван
Дата:
Hi PostgreSQL Community,

I am learning deeply how tuples are organized and column values are
accessed in different databases. As far as undertood postgres does not
store all column positions in a tuple (e.g. in header or footer). In
contrast MySQL InnoDB stores column lengths in a record header [1].
From the first glance it seems that a postgres format can have a
significant performance penalty when accessing a single column which
is located after multiple variable-length columns because searching a
column value position in a row requires multiple jumps. And in InnoDB
a position of a particular column can be found right after reading a
header.

I found several related threads in pgsql-hackers archives [2,3]
describing significant performance wins in a prototype.

Does anyone know why the format is still the same? Perhaps InnoDB and
similar formats are not so good, are they?

Please respond if you have the clue!

[1] https://dev.mysql.com/doc/refman/8.0/en/innodb-row-format.html#innodb-row-format-compact
[2] https://www.postgresql.org/message-id/flat/c58979e50702201307w64b12892uf8dfc3d8bf117ec0%40mail.gmail.com
[3] https://www.postgresql.org/message-id/flat/87irj16umm.fsf%40enterprisedb.com

-- 
Best regards,
Ivan Pavlukhin


Re: Column lookup in a row performance

От
David Rowley
Дата:
On Fri, 22 Mar 2019 at 19:13, Павлухин Иван <vololo100@gmail.com> wrote:
> I am learning deeply how tuples are organized and column values are
> accessed in different databases. As far as undertood postgres does not
> store all column positions in a tuple (e.g. in header or footer). In
> contrast MySQL InnoDB stores column lengths in a record header [1].
> From the first glance it seems that a postgres format can have a
> significant performance penalty when accessing a single column which
> is located after multiple variable-length columns because searching a
> column value position in a row requires multiple jumps. And in InnoDB
> a position of a particular column can be found right after reading a
> header.

When the tuple contains no NULLs, PostgreSQL does cache the offsets to
the attribute position in the tuple up until the first variable length
field. This allows code to directly access the value without having to
deform all fields that come before the required field, and since the
offset value is the same for all tuples of this type, then it can be
stored just once, in what we call the tuple descriptor.

I'm not aware of what innodb does, but if it stores actual offsets
fields that come after a variable length field, then that sounds like
something that needs to be stored per-tuple, so there's a trade-off;
speed vs storage space.   In PostgreSQL, one way you can obtain faster
access is to not have NULLs and put fixed width fields first. That's,
of course, not always possible, but useful to keep in mind when
deciding the order to have your columns in the table.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services