Re: [HACKERS] Custom compression methods

Поиск
Список
Период
Сортировка
От Dilip Kumar
Тема Re: [HACKERS] Custom compression methods
Дата
Msg-id CAFiTN-uUUwJnNq1gi5UDj-LAQPOQV0PsJ_=30QLn6HFUK98RZg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] Custom compression methods  (Mark Dilger <mark.dilger@enterprisedb.com>)
Список pgsql-hackers
On Wed, Sep 2, 2020 at 4:57 AM Mark Dilger <mark.dilger@enterprisedb.com> wrote:
>
>
>
> > On Aug 13, 2020, at 4:48 AM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > v1-0001: As suggested by Robert, it provides the syntax support for
> > setting the compression method for a column while creating a table and
> > adding columns.  However, we don't support changing the compression
> > method for the existing column.  As part of this patch, there is only
> > one built-in compression method that can be set (pglz).  In this, we
> > have one in-build am (pglz) and the compressed attributes will directly
> > store the oid of the AM.  In this patch, I have removed the
> > pg_attr_compresion as we don't support changing the compression
> > for the existing column so we don't need to preserve the old
> > compressions.
>
> I do not like the way pglz compression is handled in this patch.  After upgrading PostgreSQL to the first version
withthis patch included, pre-existing on-disk compressed data will not include any custom compression Oid in the
header,and toast_decompress_datum will notice that and decompress the data directly using pglz_decompress.   If the
samedata were then written back out, perhaps to another table, into a column with no custom compression method defined,
itwill get compressed by toast_compress_datum using DefaultCompressionOid, which is defined as PGLZ_COMPRESSION_AM_OID.
That isn't a proper round-trip for the data, as when it gets re-compressed, the PGLZ_COMPRESSION_AM_OID gets written
intothe header, which makes the data a bit longer, but also means that it is not byte-for-byte the same as it was,
whichis counter-intuitive.  Given that any given pglz compressed datum now has two totally different formats that might
occuron disk, code may have to consider both of them, which increases code complexity, and regression tests will need
tobe written with coverage for both of them, which increases test complexity.  It's also not easy to write the extra
tests,as there isn't any way (that I see) to intentionally write out the traditional shorter form from a newer database
server;you'd have to do something like a pg_upgrade test where you install an older server to write the older format,
upgrade,and then check that the new server can handle it. 
>
> The cleanest solution to this would seem to be removal of the compression am's Oid from the header for all
compressionams, so that pre-patch written data and post-patch written data look exactly the same.  The other solution
isto give pglz pride-of-place as the original compression mechanism, and just say that when pglz is the compression
method,no Oid gets written to the header, and only when other compression methods are used does the Oid get written.
Thissecond option seems closer to the implementation that you already have, because you already handle the
decompressionof data where the Oid is lacking, so all you have to do is intentionally not write the Oid when
compressingusing pglz. 
>
> Or did I misunderstand your implementation?

Thanks for looking into it.  Actually, I am planning to change this
patch such that we will use the upper 2 bits of the size field instead
of storing the amoid for the builtin compression methods.
e. g. 0x00 = pglz, 0x01 = zlib, 0x10 = other built-in, 0x11 -> custom
compression method.  And when 0x11 is set then only we will store the
amoid in the toast header.  I think after a week or two I will make
these changes and post my updated patch.


--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Dilip Kumar
Дата:
Сообщение: Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
Следующее
От: Dilip Kumar
Дата:
Сообщение: Re: Re: [HACKERS] Custom compression methods