Обсуждение: Optimize partial TOAST decompression
Hi, hackers!
I'm a student participating in GSoC 2019 and my project is related to TOAST slices.
When I'm getting familiar with the postgresql codebase, I find that PG_DETOAST_DATUM_SLICE, when to run on a compressed TOAST entry, will fetch all compressed data chunks then extract the relevant slice. Obviously, this is unnecessary, we only need to fetch the data chunks we need.
The patch optimizes partial TOAST decompression.
For an example of the improvement possible, this trivial example:
---------------------------------------------------------------------
create table slicingtest (
id serial primary key,
a text
);
insert into slicingtest (a) select repeat('1234567890-=abcdefghijklmnopqrstuvwxyz', 1000000) as a from generate_series(1,100);
\timing
select sum(length(substr(a, 0, 20))) from slicingtest;
---------------------------------------------------------------------
environment: Linux 4.15.0-33-generic #36~16.04.1-Ubuntu x86_64 GNU/Linux
On master, I get
Time: 28.123 ms (Take ten times average)
With the patch, I get
Time: 2.306 ms (take ten times average)
This seems to have a 10x improvement. If the number of toast data chunks is more, I believe that patch can play a greater role, there are about 200 related TOAST data chunks for each entry in the case.
			
		I'm a student participating in GSoC 2019 and my project is related to TOAST slices.
When I'm getting familiar with the postgresql codebase, I find that PG_DETOAST_DATUM_SLICE, when to run on a compressed TOAST entry, will fetch all compressed data chunks then extract the relevant slice. Obviously, this is unnecessary, we only need to fetch the data chunks we need.
The patch optimizes partial TOAST decompression.
For an example of the improvement possible, this trivial example:
---------------------------------------------------------------------
create table slicingtest (
id serial primary key,
a text
);
insert into slicingtest (a) select repeat('1234567890-=abcdefghijklmnopqrstuvwxyz', 1000000) as a from generate_series(1,100);
\timing
select sum(length(substr(a, 0, 20))) from slicingtest;
---------------------------------------------------------------------
environment: Linux 4.15.0-33-generic #36~16.04.1-Ubuntu x86_64 GNU/Linux
On master, I get
Time: 28.123 ms (Take ten times average)
With the patch, I get
Time: 2.306 ms (take ten times average)
This seems to have a 10x improvement. If the number of toast data chunks is more, I believe that patch can play a greater role, there are about 200 related TOAST data chunks for each entry in the case.
Related discussion: https://www.postgresql.org/message-id/flat/CACowWR07EDm7Y4m2kbhN_jnys%3DBBf9A6768RyQdKm_%3DNpkcaWg%40mail.gmail.com
Best regards, Binguo Bao.
Вложения
Hi, Binguo! > 2 июня 2019 г., в 19:48, Binguo Bao <djydewang@gmail.com> написал(а): > > Hi, hackers! .... > This seems to have a 10x improvement. If the number of toast data chunks is more, I believe that patch can play a greaterrole, there are about 200 related TOAST data chunks for each entry in the case. That's really cool that you could produce meaningful patch long before end of GSoC! I'll describe what is going on a little: 1. We have compressed value, which resides in TOAST table. 2. We want only some fraction of this value. We want some prefix with length L. 3. Previously Paul Ramsey submitted patch that omits decompression of value beyond desired L bytes. 4. Binguo's patch tries to do not fetch compressed data which will not bee needed to decompressor. In fact it fetches L bytesfrom TOAST table. This is not correct: L bytes of compressed data do not always can be decoded into at least L bytes of data. At worst we haveone control byte per 8 bytes of literal bytes. This means at most we need (L*9 + 8) / 8 bytes with current pglz format. Also, I'm not sure you use SET_VARSIZE_COMPRESSED correctly... Best regards, Andrey Borodin.
> This is not correct: L bytes of compressed data do not always can be decoded into at least L bytes of data. At worst we have one control byte per 8 bytes of literal bytes. This means at most we need (L*9 + 8) / 8 bytes with current pglz format.
> Also, I'm not sure you use SET_VARSIZE_COMPRESSED correctly...
Good catch! I've corrected the related code in the patch.
I followed the code in toast_fetch_datum function[1], and I didn't see any wrong with it.
Best regards, Binguo Bao
Andrey Borodin <x4mmm@yandex-team.ru> 于2019年6月23日周日 下午5:23写道:
Hi, Binguo!
> 2 июня 2019 г., в 19:48, Binguo Bao <djydewang@gmail.com> написал(а):
>
> Hi, hackers!
....
> This seems to have a 10x improvement. If the number of toast data chunks is more, I believe that patch can play a greater role, there are about 200 related TOAST data chunks for each entry in the case.
That's really cool that you could produce meaningful patch long before end of GSoC!
I'll describe what is going on a little:
1. We have compressed value, which resides in TOAST table.
2. We want only some fraction of this value. We want some prefix with length L.
3. Previously Paul Ramsey submitted patch that omits decompression of value beyond desired L bytes.
4. Binguo's patch tries to do not fetch compressed data which will not bee needed to decompressor. In fact it fetches L bytes from TOAST table.
This is not correct: L bytes of compressed data do not always can be decoded into at least L bytes of data. At worst we have one control byte per 8 bytes of literal bytes. This means at most we need (L*9 + 8) / 8 bytes with current pglz format.
Also, I'm not sure you use SET_VARSIZE_COMPRESSED correctly...
Best regards, Andrey Borodin.
Вложения
Hi! Please, do not use top-posting, i.e. reply style where you quote whole message under your response. It makes reading of archivesterse. > 24 июня 2019 г., в 7:53, Binguo Bao <djydewang@gmail.com> написал(а): > >> This is not correct: L bytes of compressed data do not always can be decoded into at least L bytes of data. At worst wehave one control byte per 8 bytes of literal bytes. This means at most we need (L*9 + 8) / 8 bytes with current pglz format. > > Good catch! I've corrected the related code in the patch. > ... > <0001-Optimize-partial-TOAST-decompression-2.patch> I've took a look into the code. I think we should extract function for computation of max_compressed_size and put it somewhere along with pglz code. Justin case something will change something about pglz so that they would not forget about compression algorithm assumption. Also I suggest just using 64 bit computation to avoid overflows. And I think it worth to check if max_compressed_size iswhole data and use min of (max_compressed_size, uncompressed_data_size). Also you declared needsize and max_compressed_size too far from use. But this will be solved by function extraction anyway. Thanks! Best regards, Andrey Borodin.
Hi!
> Andrey Borodin <x4mmm@yandex-team.ru> 于2019年6月29日周六 下午9:48写道:
Hi!
Please, do not use top-posting, i.e. reply style where you quote whole message under your response. It makes reading of archives terse.
> 24 июня 2019 г., в 7:53, Binguo Bao <djydewang@gmail.com> написал(а):
>
>> This is not correct: L bytes of compressed data do not always can be decoded into at least L bytes of data. At worst we have one control byte per 8 bytes of literal bytes. This means at most we need (L*9 + 8) / 8 bytes with current pglz format.
>
> Good catch! I've corrected the related code in the patch.
> ...
> <0001-Optimize-partial-TOAST-decompression-2.patch>
I've took a look into the code.
I think we should extract function for computation of max_compressed_size and put it somewhere along with pglz code. Just in case something will change something about pglz so that they would not forget about compression algorithm assumption.
Also I suggest just using 64 bit computation to avoid overflows. And I think it worth to check if max_compressed_size is whole data and use min of (max_compressed_size, uncompressed_data_size).
Also you declared needsize and max_compressed_size too far from use. But this will be solved by function extraction anyway.
Thanks!
Best regards, Andrey Borodin.
Thanks for the suggestion.
I've extracted function for computation for max_compressed_size and put the function into pg_lzcompress.c.
Best regards, Binguo Bao.
Вложения
On Mon, Jul 1, 2019 at 6:46 AM Binguo Bao <djydewang@gmail.com> wrote: > > Andrey Borodin <x4mmm@yandex-team.ru> 于2019年6月29日周六 下午9:48写道: >> I've took a look into the code. >> I think we should extract function for computation of max_compressed_size and put it somewhere along with pglz code. Justin case something will change something about pglz so that they would not forget about compression algorithm assumption. >> >> Also I suggest just using 64 bit computation to avoid overflows. And I think it worth to check if max_compressed_sizeis whole data and use min of (max_compressed_size, uncompressed_data_size). >> >> Also you declared needsize and max_compressed_size too far from use. But this will be solved by function extraction anyway. >> > Thanks for the suggestion. > I've extracted function for computation for max_compressed_size and put the function into pg_lzcompress.c. This looks good to me. A little commentary around why pglz_maximum_compressed_size() returns a universally correct answer (there's no way the compressed size can ever be larger than this because...) would be nice for peasants like myself. If you're looking to continue down this code line in your next patch, the next TODO item is a little more involved: a user-land (ala PG_DETOAST_DATUM) iterator API for access of TOAST datums would allow the optimization of searching of large objects like JSONB types, and so on, where the thing you are looking for is not at a known location in the object. So, things like looking for a particular substring in a string, or looking for a particular key in a JSONB. "Iterate until you find the thing." would allow optimization of some code lines that currently require full decompression of the objects. P.
Paul Ramsey <pramsey@cleverelephant.ca> 于2019年7月2日周二 下午10:46写道:
This looks good to me. A little commentary around why
pglz_maximum_compressed_size() returns a universally correct answer
(there's no way the compressed size can ever be larger than this
because...) would be nice for peasants like myself.
If you're looking to continue down this code line in your next patch,
the next TODO item is a little more involved: a user-land (ala
PG_DETOAST_DATUM) iterator API for access of TOAST datums would allow
the optimization of searching of large objects like JSONB types, and
so on, where the thing you are looking for is not at a known location
in the object. So, things like looking for a particular substring in a
string, or looking for a particular key in a JSONB. "Iterate until you
find the thing." would allow optimization of some code lines that
currently require full decompression of the objects.
P.
Thanks for your comment. I've updated the patch.
As for the iterator API, I've implemented a de-TOAST iterator actually[0].
And I’m looking for more of its application scenarios and perfecting it.
Any comments would be much appreciated.
Best Regards, Binguo Bao.
Вложения
> 3 июля 2019 г., в 18:06, Binguo Bao <djydewang@gmail.com> написал(а): > > Paul Ramsey <pramsey@cleverelephant.ca> 于2019年7月2日周二 下午10:46写道: > This looks good to me. A little commentary around why > pglz_maximum_compressed_size() returns a universally correct answer > (there's no way the compressed size can ever be larger than this > because...) would be nice for peasants like myself. > ... > > Thanks for your comment. I've updated the patch. Thanks Biguo and Paul! From my POV patch looks ready for committer, so I switched state on CF. Best regards, Andrey Borodin.
On Thu, Jul 04, 2019 at 11:10:24AM +0200, Andrey Borodin wrote:
>
>> 3 июля 2019 г., в 18:06, Binguo Bao <djydewang@gmail.com> написал(а):
>>
>> Paul Ramsey <pramsey@cleverelephant.ca> 于2019年7月2日周二 下午10:46写道:
>> This looks good to me. A little commentary around why
>> pglz_maximum_compressed_size() returns a universally correct answer
>> (there's no way the compressed size can ever be larger than this
>> because...) would be nice for peasants like myself.
>> ...
>>
>> Thanks for your comment. I've updated the patch.
>
>
> Thanks Biguo and Paul! From my POV patch looks ready for committer, so I switched state on CF.
>
I've done a bit of testing and benchmaring on this patch today, and
there's a bug somewhere, making it look like there are corrupted data.
What I'm seeing is this:
CREATE TABLE t (a text);
-- attached is data for one row
COPY t FROM '/tmp/t.data';
SELECT length(substr(a,1000)) from t;
psql: ERROR:  compressed data is corrupted
SELECT length(substr(a,0,1000)) from t;
 length 
--------
    999
(1 row)
SELECT length(substr(a,1000)) from t;
psql: ERROR:  invalid memory alloc request size 2018785106
That's quite bizarre behavior - it does work with a prefix, but not with
suffix. And the exact ERROR changes after the prefix query. (Of course,
on master it works in all cases.)
The backtrace (with the patch applied) looks like this:
#0  toast_decompress_datum (attr=0x12572e0) at tuptoaster.c:2291
#1  toast_decompress_datum (attr=0x12572e0) at tuptoaster.c:2277
#2  0x00000000004c3b08 in heap_tuple_untoast_attr_slice (attr=<optimized out>, sliceoffset=0, slicelength=-1) at
tuptoaster.c:315
#3  0x000000000085c1e5 in pg_detoast_datum_slice (datum=<optimized out>, first=<optimized out>, count=<optimized out>)
atfmgr.c:1767
 
#4  0x0000000000833b7a in text_substring (str=133761519127512, start=0, length=<optimized out>,
length_not_specified=<optimizedout>) at varlena.c:956
 
...
I've only observed this with a very small number of rows (the  data is
generated randomly with different compressibility etc.), so I'm only
attaching one row that exhibits this issue.
My guess is toast_fetch_datum_slice() gets confused by the headers or
something, or something like that. FWIW the new code added to this
function does not adhere to our code style, and would deserve some
additional explanation of what it's doing/why. Same for the
heap_tuple_untoast_attr_slice, BTW.
regards
-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 
			
		Of course, I forgot to attach the files, so here they are. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Вложения
Tomas Vondra <tomas.vondra@2ndquadrant.com> 于2019年7月5日周五 上午1:46写道:
I've done a bit of testing and benchmaring on this patch today, and
there's a bug somewhere, making it look like there are corrupted data.
What I'm seeing is this:
CREATE TABLE t (a text);
-- attached is data for one row
COPY t FROM '/tmp/t.data';
SELECT length(substr(a,1000)) from t;
psql: ERROR: compressed data is corrupted
SELECT length(substr(a,0,1000)) from t;
length
--------
999
(1 row)
SELECT length(substr(a,1000)) from t;
psql: ERROR: invalid memory alloc request size 2018785106
That's quite bizarre behavior - it does work with a prefix, but not with
suffix. And the exact ERROR changes after the prefix query. (Of course,
on master it works in all cases.)
The backtrace (with the patch applied) looks like this:
#0 toast_decompress_datum (attr=0x12572e0) at tuptoaster.c:2291
#1 toast_decompress_datum (attr=0x12572e0) at tuptoaster.c:2277
#2 0x00000000004c3b08 in heap_tuple_untoast_attr_slice (attr=<optimized out>, sliceoffset=0, slicelength=-1) at tuptoaster.c:315
#3 0x000000000085c1e5 in pg_detoast_datum_slice (datum=<optimized out>, first=<optimized out>, count=<optimized out>) at fmgr.c:1767
#4 0x0000000000833b7a in text_substring (str=133761519127512, start=0, length=<optimized out>, length_not_specified=<optimized out>) at varlena.c:956
...
I've only observed this with a very small number of rows (the data is
generated randomly with different compressibility etc.), so I'm only
attaching one row that exhibits this issue.
My guess is toast_fetch_datum_slice() gets confused by the headers or
something, or something like that. FWIW the new code added to this
function does not adhere to our code style, and would deserve some
additional explanation of what it's doing/why. Same for the
heap_tuple_untoast_attr_slice, BTW.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi, Tomas!
Thanks for your testing and the suggestion.
That's quite bizarre behavior - it does work with a prefix, but not with
suffix. And the exact ERROR changes after the prefix query.
I think bug is caused by "#2  0x00000000004c3b08 in heap_tuple_untoast_attr_slice (attr=<optimized out>, sliceoffset=0, slicelength=-1) at tuptoaster.c:315",
since I ignore the case where slicelength is negative, and I've appended some comments for heap_tuple_untoast_attr_slice for the case.
FWIW the new code added to this
function does not adhere to our code style, and would deserve some
additional explanation of what it's doing/why. Same for the
heap_tuple_untoast_attr_slice, BTW.
I've added more comments to explain the code's behavior.
Besides, I also modified the macro "TOAST_COMPRESS_RAWDATA" to "TOAST_COMPRESS_DATA" since
it is used to get toast compressed data rather than raw data.
Best Regards, Binguo Bao.
Вложения
On Sat, Jul 06, 2019 at 02:27:56AM +0800, Binguo Bao wrote:
>Hi, Tomas!
>Thanks for your testing and the suggestion.
>
>That's quite bizarre behavior - it does work with a prefix, but not with
>> suffix. And the exact ERROR changes after the prefix query.
>
>
>I think bug is caused by "#2  0x00000000004c3b08 in
>heap_tuple_untoast_attr_slice (attr=<optimized out>, sliceoffset=0,
>slicelength=-1) at tuptoaster.c:315",
>since I ignore the case where slicelength is negative, and I've appended
>some comments for heap_tuple_untoast_attr_slice for the case.
>
>FWIW the new code added to this
>> function does not adhere to our code style, and would deserve some
>> additional explanation of what it's doing/why. Same for the
>> heap_tuple_untoast_attr_slice, BTW.
>
>
>I've added more comments to explain the code's behavior.
>Besides, I also modified the macro "TOAST_COMPRESS_RAWDATA" to
>"TOAST_COMPRESS_DATA" since
>it is used to get toast compressed data rather than raw data.
>
Thanks, this seems to address the issue - I can no longer reproduce the
crashes, allowing the benchmark to complete. I'm attaching the script I
used and spreadsheet with a summary of results.
For the cases I've tested (100k - 10M values, different compressibility,
different prefix/length values), the results are kinda mixed - many
cases got much faster (~2x), but other cases got slower too. We're
however talking about queries taking a couple of miliseconds, so in
absolute numbers the differences are small.
That does not mean the optimization is useless - but the example shared
at the beginning of this thread is quite extreme, as the values are
extremely compressible. Each value is ~38MB (in plaintext), but a table
with 100 such values has only ~40MB. Tha's 100:1 compression ratio,
which I think is not typical for real-world data sets.
The data I've used are less extreme, depending on the fraction of random
data in values.
I went through the code too. I've reworder a couple of comments and code
style issues, but there are a couple of more serious issues.
1) Why rename TOAST_COMPRESS_RAWDATA to TOAST_COMPRESS_DATA?
This seems unnecessary, and it discards the clear hint that it's about
accessing the *raw* data, and the relation to TOAST_COMPRESS_RAWSIZE.
IMHO we should keep the original naming.
2) pglz_maximum_compressed_size signatures are confusing
There are two places with pglz_maximum_compressed_size signature, and
those places are kinda out of sync when it comes to parameter names:
  int32
  pglz_maximum_compressed_size(int32 raw_slice_size,
                               int32 total_compressed_size)
  extern
  int32 pglz_maximum_compressed_size(int32 raw_slice_size,
                                     int32 raw_size);
Also, pg_lzcompress.c has no concept of a "slice" because it only deals
with simple compression, slicing is responsibility of the tuptoaster. So
we should not mix those two, not even in comments.
I propose tweaks per the attached patch - I think it makes the code
clearer, and it's mostly cosmetic stuff. But I haven't tested the
changes beyond "it compiles".
regards
-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
			
		Вложения
On Sat, Jul 06, 2019 at 05:23:37PM +0200, Tomas Vondra wrote: >On Sat, Jul 06, 2019 at 02:27:56AM +0800, Binguo Bao wrote: >>Hi, Tomas! >>Thanks for your testing and the suggestion. >> >>That's quite bizarre behavior - it does work with a prefix, but not with >>>suffix. And the exact ERROR changes after the prefix query. >> >> >>I think bug is caused by "#2 0x00000000004c3b08 in >>heap_tuple_untoast_attr_slice (attr=<optimized out>, sliceoffset=0, >>slicelength=-1) at tuptoaster.c:315", >>since I ignore the case where slicelength is negative, and I've appended >>some comments for heap_tuple_untoast_attr_slice for the case. >> >>FWIW the new code added to this >>>function does not adhere to our code style, and would deserve some >>>additional explanation of what it's doing/why. Same for the >>>heap_tuple_untoast_attr_slice, BTW. >> >> >>I've added more comments to explain the code's behavior. >>Besides, I also modified the macro "TOAST_COMPRESS_RAWDATA" to >>"TOAST_COMPRESS_DATA" since >>it is used to get toast compressed data rather than raw data. >> > >Thanks, this seems to address the issue - I can no longer reproduce the >crashes, allowing the benchmark to complete. I'm attaching the script I >used and spreadsheet with a summary of results. > >For the cases I've tested (100k - 10M values, different compressibility, >different prefix/length values), the results are kinda mixed - many >cases got much faster (~2x), but other cases got slower too. We're >however talking about queries taking a couple of miliseconds, so in >absolute numbers the differences are small. > >That does not mean the optimization is useless - but the example shared >at the beginning of this thread is quite extreme, as the values are >extremely compressible. Each value is ~38MB (in plaintext), but a table >with 100 such values has only ~40MB. Tha's 100:1 compression ratio, >which I think is not typical for real-world data sets. > >The data I've used are less extreme, depending on the fraction of random >data in values. > >I went through the code too. I've reworder a couple of comments and code >style issues, but there are a couple of more serious issues. > > >1) Why rename TOAST_COMPRESS_RAWDATA to TOAST_COMPRESS_DATA? > >This seems unnecessary, and it discards the clear hint that it's about >accessing the *raw* data, and the relation to TOAST_COMPRESS_RAWSIZE. >IMHO we should keep the original naming. > > >2) pglz_maximum_compressed_size signatures are confusing > >There are two places with pglz_maximum_compressed_size signature, and >those places are kinda out of sync when it comes to parameter names: > > int32 > pglz_maximum_compressed_size(int32 raw_slice_size, > int32 total_compressed_size) > > extern > int32 pglz_maximum_compressed_size(int32 raw_slice_size, > int32 raw_size); > >Also, pg_lzcompress.c has no concept of a "slice" because it only deals >with simple compression, slicing is responsibility of the tuptoaster. So >we should not mix those two, not even in comments. > > >I propose tweaks per the attached patch - I think it makes the code >clearer, and it's mostly cosmetic stuff. But I haven't tested the >changes beyond "it compiles". > > >regards > FWIW I've done another round of tests with larger values (up to ~10MB) and the larger the values the better the speedup (a bit as expected). Attached is the script I used and a spreasheet with a summary. We still need a patch addressing the review comments, though. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Вложения
Tomas Vondra <tomas.vondra@2ndquadrant.com> 于2019年7月10日周三 上午5:12写道:
On Sat, Jul 06, 2019 at 05:23:37PM +0200, Tomas Vondra wrote:
>On Sat, Jul 06, 2019 at 02:27:56AM +0800, Binguo Bao wrote:
>>Hi, Tomas!
>>Thanks for your testing and the suggestion.
>>
>>That's quite bizarre behavior - it does work with a prefix, but not with
>>>suffix. And the exact ERROR changes after the prefix query.
>>
>>
>>I think bug is caused by "#2 0x00000000004c3b08 in
>>heap_tuple_untoast_attr_slice (attr=<optimized out>, sliceoffset=0,
>>slicelength=-1) at tuptoaster.c:315",
>>since I ignore the case where slicelength is negative, and I've appended
>>some comments for heap_tuple_untoast_attr_slice for the case.
>>
>>FWIW the new code added to this
>>>function does not adhere to our code style, and would deserve some
>>>additional explanation of what it's doing/why. Same for the
>>>heap_tuple_untoast_attr_slice, BTW.
>>
>>
>>I've added more comments to explain the code's behavior.
>>Besides, I also modified the macro "TOAST_COMPRESS_RAWDATA" to
>>"TOAST_COMPRESS_DATA" since
>>it is used to get toast compressed data rather than raw data.
>>
>
>Thanks, this seems to address the issue - I can no longer reproduce the
>crashes, allowing the benchmark to complete. I'm attaching the script I
>used and spreadsheet with a summary of results.
>
>For the cases I've tested (100k - 10M values, different compressibility,
>different prefix/length values), the results are kinda mixed - many
>cases got much faster (~2x), but other cases got slower too. We're
>however talking about queries taking a couple of miliseconds, so in
>absolute numbers the differences are small.
>
>That does not mean the optimization is useless - but the example shared
>at the beginning of this thread is quite extreme, as the values are
>extremely compressible. Each value is ~38MB (in plaintext), but a table
>with 100 such values has only ~40MB. Tha's 100:1 compression ratio,
>which I think is not typical for real-world data sets.
>
>The data I've used are less extreme, depending on the fraction of random
>data in values.
>
>I went through the code too. I've reworder a couple of comments and code
>style issues, but there are a couple of more serious issues.
>
>
>1) Why rename TOAST_COMPRESS_RAWDATA to TOAST_COMPRESS_DATA?
>
>This seems unnecessary, and it discards the clear hint that it's about
>accessing the *raw* data, and the relation to TOAST_COMPRESS_RAWSIZE.
>IMHO we should keep the original naming.
>
>
>2) pglz_maximum_compressed_size signatures are confusing
>
>There are two places with pglz_maximum_compressed_size signature, and
>those places are kinda out of sync when it comes to parameter names:
>
> int32
> pglz_maximum_compressed_size(int32 raw_slice_size,
> int32 total_compressed_size)
>
> extern
> int32 pglz_maximum_compressed_size(int32 raw_slice_size,
> int32 raw_size);
>
>Also, pg_lzcompress.c has no concept of a "slice" because it only deals
>with simple compression, slicing is responsibility of the tuptoaster. So
>we should not mix those two, not even in comments.
>
>
>I propose tweaks per the attached patch - I think it makes the code
>clearer, and it's mostly cosmetic stuff. But I haven't tested the
>changes beyond "it compiles".
>
>
>regards
>
FWIW I've done another round of tests with larger values (up to ~10MB)
and the larger the values the better the speedup (a bit as expected).
Attached is the script I used and a spreasheet with a summary.
We still need a patch addressing the review comments, though.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi, Tomas!
Sorry for the late reply. 
Thank you for further testing, I am trying to reproduce your first test summary, 
since I think the performance of the patch will not drop in almost all cases. 
Besides, If a value is composed of random characters, 
it will be hard to be compressed, because pglz requires a 25% compression ratio by default or not worth it.
This means that querying the value will not trigger the patch. But the first test results show that the patch
is slower than the master when the value is composed of random characters, which is confusing.
From the second test result, we can infer that the first test result 
was indeed affected by a random disturbance in the case of a small time-consuming.
> We still need a patch addressing the review comments, though.
done:)
Вложения
On Wed, Jul 10, 2019 at 01:35:25PM +0800, Binguo Bao wrote: >Tomas Vondra <tomas.vondra@2ndquadrant.com> 于2019年7月10日周三 上午5:12写道: > >> On Sat, Jul 06, 2019 at 05:23:37PM +0200, Tomas Vondra wrote: >> >On Sat, Jul 06, 2019 at 02:27:56AM +0800, Binguo Bao wrote: >> >>Hi, Tomas! >> >>Thanks for your testing and the suggestion. >> >> >> >>That's quite bizarre behavior - it does work with a prefix, but not with >> >>>suffix. And the exact ERROR changes after the prefix query. >> >> >> >> >> >>I think bug is caused by "#2 0x00000000004c3b08 in >> >>heap_tuple_untoast_attr_slice (attr=<optimized out>, sliceoffset=0, >> >>slicelength=-1) at tuptoaster.c:315", >> >>since I ignore the case where slicelength is negative, and I've appended >> >>some comments for heap_tuple_untoast_attr_slice for the case. >> >> >> >>FWIW the new code added to this >> >>>function does not adhere to our code style, and would deserve some >> >>>additional explanation of what it's doing/why. Same for the >> >>>heap_tuple_untoast_attr_slice, BTW. >> >> >> >> >> >>I've added more comments to explain the code's behavior. >> >>Besides, I also modified the macro "TOAST_COMPRESS_RAWDATA" to >> >>"TOAST_COMPRESS_DATA" since >> >>it is used to get toast compressed data rather than raw data. >> >> >> > >> >Thanks, this seems to address the issue - I can no longer reproduce the >> >crashes, allowing the benchmark to complete. I'm attaching the script I >> >used and spreadsheet with a summary of results. >> > >> >For the cases I've tested (100k - 10M values, different compressibility, >> >different prefix/length values), the results are kinda mixed - many >> >cases got much faster (~2x), but other cases got slower too. We're >> >however talking about queries taking a couple of miliseconds, so in >> >absolute numbers the differences are small. >> > >> >That does not mean the optimization is useless - but the example shared >> >at the beginning of this thread is quite extreme, as the values are >> >extremely compressible. Each value is ~38MB (in plaintext), but a table >> >with 100 such values has only ~40MB. Tha's 100:1 compression ratio, >> >which I think is not typical for real-world data sets. >> > >> >The data I've used are less extreme, depending on the fraction of random >> >data in values. >> > >> >I went through the code too. I've reworder a couple of comments and code >> >style issues, but there are a couple of more serious issues. >> > >> > >> >1) Why rename TOAST_COMPRESS_RAWDATA to TOAST_COMPRESS_DATA? >> > >> >This seems unnecessary, and it discards the clear hint that it's about >> >accessing the *raw* data, and the relation to TOAST_COMPRESS_RAWSIZE. >> >IMHO we should keep the original naming. >> > >> > >> >2) pglz_maximum_compressed_size signatures are confusing >> > >> >There are two places with pglz_maximum_compressed_size signature, and >> >those places are kinda out of sync when it comes to parameter names: >> > >> > int32 >> > pglz_maximum_compressed_size(int32 raw_slice_size, >> > int32 total_compressed_size) >> > >> > extern >> > int32 pglz_maximum_compressed_size(int32 raw_slice_size, >> > int32 raw_size); >> > >> >Also, pg_lzcompress.c has no concept of a "slice" because it only deals >> >with simple compression, slicing is responsibility of the tuptoaster. So >> >we should not mix those two, not even in comments. >> > >> > >> >I propose tweaks per the attached patch - I think it makes the code >> >clearer, and it's mostly cosmetic stuff. But I haven't tested the >> >changes beyond "it compiles". >> > >> > >> >regards >> > >> >> FWIW I've done another round of tests with larger values (up to ~10MB) >> and the larger the values the better the speedup (a bit as expected). >> Attached is the script I used and a spreasheet with a summary. >> >> We still need a patch addressing the review comments, though. >> >> >> regards >> >> -- >> Tomas Vondra http://www.2ndQuadrant.com >> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services >> > >Hi, Tomas! Sorry for the late reply. Thank you for further testing, I >am trying to reproduce your first test summary, since I think the >performance of the patch will not drop in almost all cases. > I've done some changes to the test script since the first benchmark, aiming to make the results more stable 1) uses larger amount of data (10x more) 2) the data set recreated for each run (to rule out random differences in the random data affecting the runs differently) 3) minor configuration changes (more shared buffers etc.) I don't think we need to worry about small differences (within ~5%) which can easily be due to changes to binary layout. And otherwise results from the second benchmark round seem much more stable. >Besides, If a value is composed of random characters, >it will be hard to be compressed, because pglz requires a 25% compression >ratio by default or not worth it. >This means that querying the value will not trigger the patch. But the >first test results show that the patch >is slower than the master when the value is composed of random characters, >which is confusing. > Yes, I know. But the values have compressible and incompressible (random) part, so in most cases the value should be compressible, although with various compression ratio. I have not tracked the size of the loaded data so I don't know which cases happened to be compressed or not. I'll rerun the test and I'll include this information. >From the second test result, we can infer that the first test result >was indeed affected by a random disturbance in the case of a small >time-consuming. > Yes, I agree. >> We still need a patch addressing the review comments, though. >done:) Thanks. -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hello, can you please update this patch? -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Sep 25, 2019 at 05:38:34PM -0300, Alvaro Herrera wrote: >Hello, can you please update this patch? > I'm not the patch author, but I've been looking at the patch recently and I have a rebased version at hand - so attached. FWIW I believe the patch is solid and in good shape, and it got stuck after I reported some benchmarks showing somewhat flaky performance. I know Binguo Bao was trying to reproduce the benchmark, and I assume the silence means he was not successful :-( On the larger data set the patch however performed very nicely, so maybe I just did something stupid while running the smaller one, or maybe it's just noise (the queries were just a couple of ms in that test). I do plan to rerun the benchmarks and investigate a bit - if I find the patch is fine, I'd like to commit it shortly. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Вложения
On Fri, Sep 27, 2019 at 01:00:36AM +0200, Tomas Vondra wrote:
>On Wed, Sep 25, 2019 at 05:38:34PM -0300, Alvaro Herrera wrote:
>>Hello, can you please update this patch?
>>
>
>I'm not the patch author, but I've been looking at the patch recently
>and I have a rebased version at hand - so attached.
>
>FWIW I believe the patch is solid and in good shape, and it got stuck
>after I reported some benchmarks showing somewhat flaky performance. I
>know Binguo Bao was trying to reproduce the benchmark, and I assume the
>silence means he was not successful :-(
>
>On the larger data set the patch however performed very nicely, so maybe
>I just did something stupid while running the smaller one, or maybe it's
>just noise (the queries were just a couple of ms in that test). I do
>plan to rerun the benchmarks and investigate a bit - if I find the patch
>is fine, I'd like to commit it shortly.
>
OK, I was just about to push this after polishing it a bit, but then I
noticed the patch does not address one of the points from Paul' review,
asking for comment explaining the pglz_maximum_compressed_size formula.
I mean this:
    /*
     * Use int64 to prevent overflow during calculation.
     */
    compressed_size = (int32) ((int64) rawsize * 9 + 8) / 8;
I'm not very familiar with pglz internals, but I'm a bit puzzled by
this. My first instinct was to compare it to this:
    #define PGLZ_MAX_OUTPUT(_dlen)    ((_dlen) + 4)
but clearly that's a very different (much simpler) formula. So why
shouldn't pglz_maximum_compressed_size simply use this macro?
Regarding benchmarks - as I mentioned, I've repeated the tests and
everything seems fine. The results from the two usual machines are
available in [1]. There are only very few noise-level regressions and
many very significant speedups.
I have a theory what went wrong in the first run that showed some
regressions - it's possible the machine had CPU power management
enabled. I can't check this retroactively, but it'd explain variability
for short queries, and smooth behavior on longer queries.
[1] https://github.com/tvondra/toast-optimize
regards
-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
			
		> 30 сент. 2019 г., в 20:56, Tomas Vondra <tomas.vondra@2ndquadrant.com> написал(а): > > I mean this: > > /* > * Use int64 to prevent overflow during calculation. > */ > compressed_size = (int32) ((int64) rawsize * 9 + 8) / 8; > > I'm not very familiar with pglz internals, but I'm a bit puzzled by > this. My first instinct was to compare it to this: > > #define PGLZ_MAX_OUTPUT(_dlen) ((_dlen) + 4) > > but clearly that's a very different (much simpler) formula. So why > shouldn't pglz_maximum_compressed_size simply use this macro? compressed_size accounts for possible increase of size during compression. pglz can consume up to 1 control byte for each8 bytes of data in worst case. Even if whole data is compressed well - there can be prefix compressed extremely ineffectively. Thus, if you are going todecompress rawsize bytes, you need at most compressed_size bytes of compressed input.
On Mon, Sep 30, 2019 at 09:20:22PM +0500, Andrey Borodin wrote: > > >> 30 сент. 2019 г., в 20:56, Tomas Vondra <tomas.vondra@2ndquadrant.com> написал(а): >> >> I mean this: >> >> /* >> * Use int64 to prevent overflow during calculation. >> */ >> compressed_size = (int32) ((int64) rawsize * 9 + 8) / 8; >> >> I'm not very familiar with pglz internals, but I'm a bit puzzled by >> this. My first instinct was to compare it to this: >> >> #define PGLZ_MAX_OUTPUT(_dlen) ((_dlen) + 4) >> >> but clearly that's a very different (much simpler) formula. So why >> shouldn't pglz_maximum_compressed_size simply use this macro? > >compressed_size accounts for possible increase of size during >compression. pglz can consume up to 1 control byte for each 8 bytes of >data in worst case. OK, but does that actually translate in to the formula? We essentially need to count 8-byte chunks in raw data, and multiply that by 9. Which gives us something like nchunks = ((rawsize + 7) / 8) * 9; which is not quite what the patch does. >Even if whole data is compressed well - there can be prefix compressed >extremely ineffectively. Thus, if you are going to decompress rawsize >bytes, you need at most compressed_size bytes of compressed input. OK, that explains why we can't use the PGLZ_MAX_OUTPUT macro. -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
> 30 сент. 2019 г., в 22:29, Tomas Vondra <tomas.vondra@2ndquadrant.com> написал(а): > > On Mon, Sep 30, 2019 at 09:20:22PM +0500, Andrey Borodin wrote: >> >> >>> 30 сент. 2019 г., в 20:56, Tomas Vondra <tomas.vondra@2ndquadrant.com> написал(а): >>> >>> I mean this: >>> >>> /* >>> * Use int64 to prevent overflow during calculation. >>> */ >>> compressed_size = (int32) ((int64) rawsize * 9 + 8) / 8; >>> >>> I'm not very familiar with pglz internals, but I'm a bit puzzled by >>> this. My first instinct was to compare it to this: >>> >>> #define PGLZ_MAX_OUTPUT(_dlen) ((_dlen) + 4) >>> >>> but clearly that's a very different (much simpler) formula. So why >>> shouldn't pglz_maximum_compressed_size simply use this macro? > >> >> compressed_size accounts for possible increase of size during >> compression. pglz can consume up to 1 control byte for each 8 bytes of >> data in worst case. > > OK, but does that actually translate in to the formula? We essentially > need to count 8-byte chunks in raw data, and multiply that by 9. Which > gives us something like > > nchunks = ((rawsize + 7) / 8) * 9; > > which is not quite what the patch does. I'm afraid neither formula is correct, but all this is hair-splitting differences. Your formula does not account for the fact that we may not need all bytes from last chunk. Consider desired decompressed size of 3 bytes. We may need 1 control byte and 3 literals, 4 bytes total But nchunks = 9. Binguo's formula is appending 1 control bit per data byte and one extra control byte. Consider size = 8 bytes. We need 1 control byte, 8 literals, 9 total. But compressed_size = 10. Mathematically correct formula is compressed_size = (int32) ((int64) rawsize * 9 + 7) / 8; Here we take one bit for each data byte, and 7 control bits for overflow. But this equations make no big difference, each formula is safe. I'd pick one which is easier to understand and document(IMO, its nchunks = ((rawsize + 7) / 8) * 9). Thanks! -- Andrey Borodin Open source RDBMS development team leader Yandex.Cloud
On Tue, Oct 01, 2019 at 11:20:39AM +0500, Andrey Borodin wrote: > > >> 30 сент. 2019 г., в 22:29, Tomas Vondra <tomas.vondra@2ndquadrant.com> написал(а): >> >> On Mon, Sep 30, 2019 at 09:20:22PM +0500, Andrey Borodin wrote: >>> >>> >>>> 30 сент. 2019 г., в 20:56, Tomas Vondra <tomas.vondra@2ndquadrant.com> написал(а): >>>> >>>> I mean this: >>>> >>>> /* >>>> * Use int64 to prevent overflow during calculation. >>>> */ >>>> compressed_size = (int32) ((int64) rawsize * 9 + 8) / 8; >>>> >>>> I'm not very familiar with pglz internals, but I'm a bit puzzled by >>>> this. My first instinct was to compare it to this: >>>> >>>> #define PGLZ_MAX_OUTPUT(_dlen) ((_dlen) + 4) >>>> >>>> but clearly that's a very different (much simpler) formula. So why >>>> shouldn't pglz_maximum_compressed_size simply use this macro? >> >>> >>> compressed_size accounts for possible increase of size during >>> compression. pglz can consume up to 1 control byte for each 8 bytes of >>> data in worst case. >> >> OK, but does that actually translate in to the formula? We essentially >> need to count 8-byte chunks in raw data, and multiply that by 9. Which >> gives us something like >> >> nchunks = ((rawsize + 7) / 8) * 9; >> >> which is not quite what the patch does. > >I'm afraid neither formula is correct, but all this is hair-splitting differences. > Sure. I just want to be sure the formula is safe and we won't end up using too low value in some corner case. >Your formula does not account for the fact that we may not need all bytes from last chunk. >Consider desired decompressed size of 3 bytes. We may need 1 control byte and 3 literals, 4 bytes total >But nchunks = 9. > OK, so essentially this means my formula works with whole chunks, i.e. if we happen to need just a part of a decompressed chunk, we still request enough data to decompress it whole. This way we may request up to 7 extra bytes, which seems fine. >Binguo's formula is appending 1 control bit per data byte and one extra >control byte. Consider size = 8 bytes. We need 1 control byte, 8 >literals, 9 total. But compressed_size = 10. > >Mathematically correct formula is compressed_size = (int32) ((int64) >rawsize * 9 + 7) / 8; Here we take one bit for each data byte, and 7 >control bits for overflow. > >But this equations make no big difference, each formula is safe. I'd >pick one which is easier to understand and document (IMO, its nchunks = >((rawsize + 7) / 8) * 9). > I'd use the *mathematically correct* formula, it doesn't seem to be any more complex, and the "one bit per byte, complete bytes" explanation seems quite understandable. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Oct 01, 2019 at 12:08:05PM +0200, Tomas Vondra wrote: >On Tue, Oct 01, 2019 at 11:20:39AM +0500, Andrey Borodin wrote: >> >> >>>30 сент. 2019 г., в 22:29, Tomas Vondra <tomas.vondra@2ndquadrant.com> написал(а): >>> >>>On Mon, Sep 30, 2019 at 09:20:22PM +0500, Andrey Borodin wrote: >>>> >>>> >>>>>30 сент. 2019 г., в 20:56, Tomas Vondra <tomas.vondra@2ndquadrant.com> написал(а): >>>>> >>>>>I mean this: >>>>> >>>>> /* >>>>> * Use int64 to prevent overflow during calculation. >>>>> */ >>>>> compressed_size = (int32) ((int64) rawsize * 9 + 8) / 8; >>>>> >>>>>I'm not very familiar with pglz internals, but I'm a bit puzzled by >>>>>this. My first instinct was to compare it to this: >>>>> >>>>> #define PGLZ_MAX_OUTPUT(_dlen) ((_dlen) + 4) >>>>> >>>>>but clearly that's a very different (much simpler) formula. So why >>>>>shouldn't pglz_maximum_compressed_size simply use this macro? >>> >>>> >>>>compressed_size accounts for possible increase of size during >>>>compression. pglz can consume up to 1 control byte for each 8 bytes of >>>>data in worst case. >>> >>>OK, but does that actually translate in to the formula? We essentially >>>need to count 8-byte chunks in raw data, and multiply that by 9. Which >>>gives us something like >>> >>> nchunks = ((rawsize + 7) / 8) * 9; >>> >>>which is not quite what the patch does. >> >>I'm afraid neither formula is correct, but all this is hair-splitting differences. >> > >Sure. I just want to be sure the formula is safe and we won't end up >using too low value in some corner case. > >>Your formula does not account for the fact that we may not need all bytes from last chunk. >>Consider desired decompressed size of 3 bytes. We may need 1 control byte and 3 literals, 4 bytes total >>But nchunks = 9. >> > >OK, so essentially this means my formula works with whole chunks, i.e. >if we happen to need just a part of a decompressed chunk, we still >request enough data to decompress it whole. This way we may request up >to 7 extra bytes, which seems fine. > >>Binguo's formula is appending 1 control bit per data byte and one extra >>control byte. Consider size = 8 bytes. We need 1 control byte, 8 >>literals, 9 total. But compressed_size = 10. >> >>Mathematically correct formula is compressed_size = (int32) ((int64) >>rawsize * 9 + 7) / 8; Here we take one bit for each data byte, and 7 >>control bits for overflow. >> >>But this equations make no big difference, each formula is safe. I'd >>pick one which is easier to understand and document (IMO, its nchunks = >>((rawsize + 7) / 8) * 9). >> > >I'd use the *mathematically correct* formula, it doesn't seem to be any >more complex, and the "one bit per byte, complete bytes" explanation >seems quite understandable. > Pushed. I've ended up using the *mathematically correct* formula, hopefully with sufficient explanation why it's correct. I've also polished a couple more comments, and pushed like that. Thanks to Binguo Bao for this improvement, and all the reviewers in this thread. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Oct 01, 2019 at 02:34:20PM +0200, Tomas Vondra wrote:
>On Tue, Oct 01, 2019 at 12:08:05PM +0200, Tomas Vondra wrote:
>>On Tue, Oct 01, 2019 at 11:20:39AM +0500, Andrey Borodin wrote:
>>>
>>>
>>>>30 сент. 2019 г., в 22:29, Tomas Vondra <tomas.vondra@2ndquadrant.com> написал(а):
>>>>
>>>>On Mon, Sep 30, 2019 at 09:20:22PM +0500, Andrey Borodin wrote:
>>>>>
>>>>>
>>>>>>30 сент. 2019 г., в 20:56, Tomas Vondra <tomas.vondra@2ndquadrant.com> написал(а):
>>>>>>
>>>>>>I mean this:
>>>>>>
>>>>>>/*
>>>>>> * Use int64 to prevent overflow during calculation.
>>>>>> */
>>>>>>compressed_size = (int32) ((int64) rawsize * 9 + 8) / 8;
>>>>>>
>>>>>>I'm not very familiar with pglz internals, but I'm a bit puzzled by
>>>>>>this. My first instinct was to compare it to this:
>>>>>>
>>>>>>#define PGLZ_MAX_OUTPUT(_dlen)    ((_dlen) + 4)
>>>>>>
>>>>>>but clearly that's a very different (much simpler) formula. So why
>>>>>>shouldn't pglz_maximum_compressed_size simply use this macro?
>>>>
>>>>>
>>>>>compressed_size accounts for possible increase of size during
>>>>>compression. pglz can consume up to 1 control byte for each 8 bytes of
>>>>>data in worst case.
>>>>
>>>>OK, but does that actually translate in to the formula? We essentially
>>>>need to count 8-byte chunks in raw data, and multiply that by 9. Which
>>>>gives us something like
>>>>
>>>>nchunks = ((rawsize + 7) / 8) * 9;
>>>>
>>>>which is not quite what the patch does.
>>>
>>>I'm afraid neither formula is correct, but all this is hair-splitting differences.
>>>
>>
>>Sure. I just want to be sure the formula is safe and we won't end up
>>using too low value in some corner case.
>>
>>>Your formula does not account for the fact that we may not need all bytes from last chunk.
>>>Consider desired decompressed size of 3 bytes. We may need 1 control byte and 3 literals, 4 bytes total
>>>But nchunks = 9.
>>>
>>
>>OK, so essentially this means my formula works with whole chunks, i.e.
>>if we happen to need just a part of a decompressed chunk, we still
>>request enough data to decompress it whole. This way we may request up
>>to 7 extra bytes, which seems fine.
>>
>>>Binguo's formula is appending 1 control bit per data byte and one extra
>>>control byte.  Consider size = 8 bytes. We need 1 control byte, 8
>>>literals, 9 total.  But compressed_size = 10.
>>>
>>>Mathematically correct formula is compressed_size = (int32) ((int64)
>>>rawsize * 9 + 7) / 8; Here we take one bit for each data byte, and 7
>>>control bits for overflow.
>>>
>>>But this equations make no big difference, each formula is safe. I'd
>>>pick one which is easier to understand and document (IMO, its nchunks =
>>>((rawsize + 7) / 8) * 9).
>>>
>>
>>I'd use the *mathematically correct* formula, it doesn't seem to be any
>>more complex, and the "one bit per byte, complete bytes" explanation
>>seems quite understandable.
>>
>
>Pushed.
>
>I've ended up using the *mathematically correct* formula, hopefully
>with sufficient explanation why it's correct. I've also polished a
>couple more comments, and pushed like that.
>
>Thanks to Binguo Bao for this improvement, and all the reviewers in this
>thread.
>
Hmmm, this seems to trigger a failure on thorntail, which is a sparc64
machine (and it seems to pass on all x86 machines, so far). Per the
backtrace, it seems to have failed like this:
    Core was generated by `postgres: parallel worker for PID 2341                                        '.
    Program terminated with signal SIGUSR1, User defined signal 1.
    #0  heap_tuple_untoast_attr_slice (attr=<optimized out>, sliceoffset=<optimized out>, slicelength=<optimized out>)
at/home/nm/farm/sparc64_deb10_gcc_64_ubsan/HEAD/pgsql.build/../pgsql/src/backend/access/common/detoast.c:235
 
    235                max_size = pglz_maximum_compressed_size(sliceoffset + slicelength,
    #0   heap_tuple_untoast_attr_slice (attr=<optimized out>, sliceoffset=<optimized out>, slicelength=<optimized out>)
at/home/nm/farm/sparc64_deb10_gcc_64_ubsan/HEAD/pgsql.build/../pgsql/src/backend/access/common/detoast.c:235
 
    #1  0x00000100003d4ae8 in ExecInterpExpr (state=0x10000d02298, econtext=0x10000d01510, isnull=0x7feffb2fd1f) at
/home/nm/farm/sparc64_deb10_gcc_64_ubsan/HEAD/pgsql.build/../pgsql/src/backend/executor/execExprInterp.c:690
    ...
so likely on this line:
    max_size = pglz_maximum_compressed_size(sliceoffset + slicelength,
                                            TOAST_COMPRESS_SIZE(attr));
the offset+length is just intereger arithmetics, so I don't see why that
would fail. So it has to be TOAST_COMPRESS_SIZE, which is defined like
this:
    #define TOAST_COMPRESS_SIZE(ptr)  ((int32) VARSIZE(ptr) - TOAST_COMPRESS_HDRSZ)
I wonder if that's wrong, somehow ... Maybe it should use VARSIZE_ANY,
but then how would it work on any platform and only fail on sparc64?
regards
-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 
			
		Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> Hmmm, this seems to trigger a failure on thorntail, which is a sparc64
> machine (and it seems to pass on all x86 machines, so far).
gharial's not happy either, and I bet if you wait a bit longer you'll
see the same on other big-endian machines.
> I wonder if that's wrong, somehow ... Maybe it should use VARSIZE_ANY,
> but then how would it work on any platform and only fail on sparc64?
Maybe it accidentally seems to work on little-endian, thanks to the
different definitions of varlena headers?
            regards, tom lane
			
		On Tue, Oct 01, 2019 at 10:10:37AM -0400, Tom Lane wrote: >Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: >> Hmmm, this seems to trigger a failure on thorntail, which is a sparc64 >> machine (and it seems to pass on all x86 machines, so far). > >gharial's not happy either, and I bet if you wait a bit longer you'll >see the same on other big-endian machines. > >> I wonder if that's wrong, somehow ... Maybe it should use VARSIZE_ANY, >> but then how would it work on any platform and only fail on sparc64? > >Maybe it accidentally seems to work on little-endian, thanks to the >different definitions of varlena headers? > Maybe. Let's see if just using VARSIZE_ANY does the trick. If not, I'll investigate further. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> On Tue, Oct 01, 2019 at 10:10:37AM -0400, Tom Lane wrote:
>> Maybe it accidentally seems to work on little-endian, thanks to the
>> different definitions of varlena headers?
> Maybe. Let's see if just using VARSIZE_ANY does the trick. If not, I'll
> investigate further.
FWIW, prairiedog got past that test, so whatever it is seems specific
to big-endian 64-bit.  Odd.
            regards, tom lane
			
		Today I noticed strange behaviour, consider the following test:
postgres@126111=#create table foo ( a text );
CREATE TABLE
postgres@126111=#insert into foo values ( repeat('PostgreSQL is the world''s best database and leading by an Open Source Community.', 8000));
INSERT 0 1
CREATE TABLE
postgres@126111=#insert into foo values ( repeat('PostgreSQL is the world''s best database and leading by an Open Source Community.', 8000));
INSERT 0 1
postgres@126111=#select substring(a from 639921 for 81) from foo;
substring
-----------
 
(1 row)
substring
-----------
(1 row)
Before below commit:
commit 540f31680913b4e11f2caa40cafeca269cfcb22f
Author: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Tue Oct 1 16:53:04 2019 +0200
Blind attempt to fix pglz_maximum_compressed_size
    
Commit 11a078cf87 triggered failures on big-endian machines, and the
only plausible place for an issue seems to be that TOAST_COMPRESS_SIZE
calls VARSIZE instead of VARSIZE_ANY. So try fixing that blindly.
    
Discussion: https://www.postgresql.org/message-id/20191001131803.j6uin7nho7t6vxzy%40development
Author: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Tue Oct 1 16:53:04 2019 +0200
Blind attempt to fix pglz_maximum_compressed_size
Commit 11a078cf87 triggered failures on big-endian machines, and the
only plausible place for an issue seems to be that TOAST_COMPRESS_SIZE
calls VARSIZE instead of VARSIZE_ANY. So try fixing that blindly.
Discussion: https://www.postgresql.org/message-id/20191001131803.j6uin7nho7t6vxzy%40development
postgres@75761=#select substring(a from 639921 for 81) from foo;
                                    substring                                     
----------------------------------------------------------------------------------
PostgreSQL is the world's best database and leading by an Open Source Community.
(1 row)
----------------------------------------------------------------------------------
PostgreSQL is the world's best database and leading by an Open Source Community.
(1 row)
On Thu, Nov 14, 2019 at 03:27:42PM +0530, Rushabh Lathia wrote:
>Today I noticed strange behaviour, consider the following test:
>
>postgres@126111=#create table foo ( a text );
>CREATE TABLE
>postgres@126111=#insert into foo values ( repeat('PostgreSQL is the
>world''s best database and leading by an Open Source Community.', 8000));
>INSERT 0 1
>
>postgres@126111=#select substring(a from 639921 for 81) from foo;
> substring
>-----------
>
>(1 row)
>
Hmmm. I think the issue is heap_tuple_untoast_attr_slice is using the
wrong way to determine compressed size in the VARATT_IS_EXTERNAL_ONDISK
branch. It does this
     max_size = pglz_maximum_compressed_size(sliceoffset + slicelength,
                                             TOAST_COMPRESS_SIZE(attr));
But for the example you've posted TOAST_COMPRESS_SIZE(attr) returns 10,
which is obviously bogus because the TOAST table contains ~75kB of data.
I think it should be doing this instead:
     max_size = pglz_maximum_compressed_size(sliceoffset + slicelength,
                                             toast_pointer.va_extsize);
At least that fixes it for me.
I wonder if this actually explains the crashes 540f3168091 was supposed
to fix, but it just masked them instead.
regards
-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
			
		Вложения
On Thu, Nov 14, 2019 at 6:30 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
On Thu, Nov 14, 2019 at 03:27:42PM +0530, Rushabh Lathia wrote:
>Today I noticed strange behaviour, consider the following test:
>
>postgres@126111=#create table foo ( a text );
>CREATE TABLE
>postgres@126111=#insert into foo values ( repeat('PostgreSQL is the
>world''s best database and leading by an Open Source Community.', 8000));
>INSERT 0 1
>
>postgres@126111=#select substring(a from 639921 for 81) from foo;
> substring
>-----------
>
>(1 row)
>
Hmmm. I think the issue is heap_tuple_untoast_attr_slice is using the
wrong way to determine compressed size in the VARATT_IS_EXTERNAL_ONDISK
branch. It does this
max_size = pglz_maximum_compressed_size(sliceoffset + slicelength,
TOAST_COMPRESS_SIZE(attr));
But for the example you've posted TOAST_COMPRESS_SIZE(attr) returns 10,
which is obviously bogus because the TOAST table contains ~75kB of data.
I think it should be doing this instead:
max_size = pglz_maximum_compressed_size(sliceoffset + slicelength,
toast_pointer.va_extsize);
At least that fixes it for me.
I wonder if this actually explains the crashes 540f3168091 was supposed
to fix, but it just masked them instead.
I tested the attached patch and that fixes the issue for me.
Thanks,
--
Rushabh Lathia