Обсуждение: [HACKERS] pgsql 10: hash indexes testing

Поиск

Список

Период

Сортировка

[HACKERS] pgsql 10: hash indexes testing

От

Дата:

04 июля 2017 г., 16:57:28

Hi,

As I am actively working on a big project I figured I'd give PGSQL 10 a
go, primarily because of hash indexes.

PostgreSQL 10 version in use is: 10~beta2~20170620.2224-1~491.gitd412f79.pgdg+1

Things are mostly well with hash indexes (plain, non-unique) giving me
a rather lovely saving in index size: they are roughly 1/4 to 1/2 the
size of btree.  This is of use as the btree indexes can be between 25GB
and 350GB in size.

There is one index that caused an issue. Towards the end of an import
I got the following error:

out of overflow pages in hash index

The data being indexed is BYTEA, (quasi)random and 64 bytes in size.
The table has over 2 billion entries. The data is not unique. There's
an average of 10 duplicates for every unique value.

Is this a valid error message or have I spotted a bug?

I tried to duplicate the error with a reindex of the data set that I
uploaded using btree for that index but I couldn't. The difference, as
near as I can remember, is different (quasi)random values being indexed
(still same level of duplication).

I'll see if I can get the table reloaded a-fresh to see if that's what
it takes to trigger the error. The upload is done in a single transaction
with COPY BINARY of 3-4 tables, one after the other in a loop until data
is exhausted (ie: COPY into table A, B, C and D and then back to A and
repeat - there is data processing happening and this helps keep memory
usage in check).

A single transaction can COPY millions of rows (average is about 3.7
million rows with the table in question getting 3 million). There are
33 transactions in play at any one time and they all upload to the same
tables.  Last things done in a transaction are a couple of small,
single-row INSERTs into a couple of tables and then COMMIT. There is
one transaction per connection. The whole process can take 12-15 hours
and involves 1000 transactions.

Hopefully it's not a specific set of random values that generates
the error cos duplicating THAT will be outpaced by the death of the
universe. :)

AP

Re: [HACKERS] pgsql 10: hash indexes testing

От

Amit Kapila

Дата:

05 июля 2017 г., 08:40:10

On Tue, Jul 4, 2017 at 4:27 PM, AP <ap@zip.com.au> wrote:
> Hi,
>
> As I am actively working on a big project I figured I'd give PGSQL 10 a
> go, primarily because of hash indexes.
>
> PostgreSQL 10 version in use is: 10~beta2~20170620.2224-1~491.gitd412f79.pgdg+1
>
> Things are mostly well with hash indexes (plain, non-unique) giving me
> a rather lovely saving in index size: they are roughly 1/4 to 1/2 the
> size of btree.  This is of use as the btree indexes can be between 25GB
> and 350GB in size.
>
> There is one index that caused an issue. Towards the end of an import
> I got the following error:
>
> out of overflow pages in hash index
>
> The data being indexed is BYTEA, (quasi)random and 64 bytes in size.
> The table has over 2 billion entries. The data is not unique. There's
> an average of 10 duplicates for every unique value.
>
> Is this a valid error message or have I spotted a bug?
>

It is difficult to say at this stage, but I think we can figure out.
We can get such a message if we consume the maximum overflow pages
that hash index can support which is limited by a number of
bitmappages.  Can you try to use pgstattuple extension and get us the
results of Select * from pgstathashindex('index_name');?  If the
number of bitmappages is 128 and total overflow pages are 128 * 4096,
then that would mean that all the pages are used.  Then maybe we can
try to see if all the pages are full with the help of pageinspect, but
I think we don't need to go there at this stage.

> I tried to duplicate the error with a reindex of the data set that I
> uploaded using btree for that index but I couldn't. The difference, as
> near as I can remember, is different (quasi)random values being indexed
> (still same level of duplication).
>

One probably theory is that in the original index where you hit the
error, there are some unused overflow pages in some of the buckets
which can be squeezed.  The reason those didn't get squeezed is that
Vacuum wouldn't have been kicked off on this workload.

> I'll see if I can get the table reloaded a-fresh to see if that's what
> it takes to trigger the error.
>

Thanks.  I suggest when an error occurs, don't throw away that index
because we can get some useful information from it to diagnose the
reason of error.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] pgsql 10: hash indexes testing

От

Jeff Janes

Дата:

05 июля 2017 г., 09:23:20

On Tue, Jul 4, 2017 at 3:57 AM, AP <ap@zip.com.au> wrote:

The data being indexed is BYTEA, (quasi)random and 64 bytes in size.
The table has over 2 billion entries. The data is not unique. There's
an average of 10 duplicates for every unique value.

What is the number of duplicates for the most common value?

Cheers,

Jeff

Re: [HACKERS] pgsql 10: hash indexes testing

От

Дата:

05 июля 2017 г., 10:23:23

On Wed, Jul 05, 2017 at 08:10:10AM +0530, Amit Kapila wrote:
> On Tue, Jul 4, 2017 at 4:27 PM, AP <ap@zip.com.au> wrote:
> > There is one index that caused an issue. Towards the end of an import
> > I got the following error:
> >
> > out of overflow pages in hash index
> >
> > The data being indexed is BYTEA, (quasi)random and 64 bytes in size.
> > The table has over 2 billion entries. The data is not unique. There's
> > an average of 10 duplicates for every unique value.
> >
> > Is this a valid error message or have I spotted a bug?
> 
> It is difficult to say at this stage, but I think we can figure out.
> We can get such a message if we consume the maximum overflow pages
> that hash index can support which is limited by a number of
> bitmappages.  Can you try to use pgstattuple extension and get us the
> results of Select * from pgstathashindex('index_name');?  If the
> number of bitmappages is 128 and total overflow pages are 128 * 4096,
> then that would mean that all the pages are used.  Then maybe we can

Hmm. Unless I misunderstood that'd mean that overflow_pages/4096 should
result in a number <= 128 at the moment, right? If so then something is
amiss:

# select * from  pgstathashindex('link_datum_id_hash_idx');version | bucket_pages | overflow_pages | bitmap_pages |
unused_pages| live_items | dead_items |   free_percent   
 
---------+--------------+----------------+--------------+--------------+------------+------------+------------------
 3 |     10485760 |        2131192 |           66 |            0 | 2975444240 |          0 | 1065.19942179026
 
(1 row)

oldmdstash=# select 2131192/4096;?column? 
----------     520
(1 row)

And I do appear to have an odd percentage of free space. :)

This index was created yesterday so it has been around for maybe 18 hours.
Autovac is likely to have hit it by now.

> > I'll see if I can get the table reloaded a-fresh to see if that's what
> > it takes to trigger the error.
> 
> Thanks.  I suggest when an error occurs, don't throw away that index
> because we can get some useful information from it to diagnose the
> reason of error.

I'll try and set this up now.

AP

Re: [HACKERS] pgsql 10: hash indexes testing

От

Amit Kapila

Дата:

05 июля 2017 г., 10:59:09

On Wed, Jul 5, 2017 at 9:53 AM, AP <ap@zip.com.au> wrote:
> On Wed, Jul 05, 2017 at 08:10:10AM +0530, Amit Kapila wrote:
>> On Tue, Jul 4, 2017 at 4:27 PM, AP <ap@zip.com.au> wrote:
>> > There is one index that caused an issue. Towards the end of an import
>> > I got the following error:
>> >
>> > out of overflow pages in hash index
>> >
>> > The data being indexed is BYTEA, (quasi)random and 64 bytes in size.
>> > The table has over 2 billion entries. The data is not unique. There's
>> > an average of 10 duplicates for every unique value.
>> >
>> > Is this a valid error message or have I spotted a bug?
>>
>> It is difficult to say at this stage, but I think we can figure out.
>> We can get such a message if we consume the maximum overflow pages
>> that hash index can support which is limited by a number of
>> bitmappages.  Can you try to use pgstattuple extension and get us the
>> results of Select * from pgstathashindex('index_name');?  If the
>> number of bitmappages is 128 and total overflow pages are 128 * 4096,
>> then that would mean that all the pages are used.  Then maybe we can
>
> Hmm. Unless I misunderstood that'd mean that overflow_pages/4096 should
> result in a number <= 128 at the moment, right?
>

No, sorry, I think my calculation above has something missing.  It
should be 128 * 4096 * 8.  How we can compute this number is
no_bitmap_pages * no_bits_used_to_represent_overflow_pages.

In each bitmap page (8K), we can use 4K (to meet power of 2 criteria)
to represent overflow pages.  We use one bit to represent each
overflow page. Also, you need to include bitmap pages in overflow
pages as we consider those as a subset of the overflow pages.

>If so then something is
> amiss:
>
> # select * from  pgstathashindex('link_datum_id_hash_idx');
>  version | bucket_pages | overflow_pages | bitmap_pages | unused_pages | live_items | dead_items |   free_percent
> ---------+--------------+----------------+--------------+--------------+------------+------------+------------------
>        3 |     10485760 |        2131192 |           66 |            0 | 2975444240 |          0 | 1065.19942179026
> (1 row)
>
> oldmdstash=# select 2131192/4096;
>  ?column?
> ----------
>       520
> (1 row)
>

You need to divide 520 by 8 to get the bitmap page.  Is this the index
in which you get the error or is this the one on which you have done
REINDEX?

> And I do appear to have an odd percentage of free space. :)
>

It looks like Vacuum hasn't been triggered.

> This index was created yesterday so it has been around for maybe 18 hours.
> Autovac is likely to have hit it by now.
>

Do you have any deletes?  How have you verified whether autovacuum has
been triggered or not?



-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] pgsql 10: hash indexes testing

От

Дата:

05 июля 2017 г., 11:33:45

On Wed, Jul 05, 2017 at 10:29:09AM +0530, Amit Kapila wrote:
> >> bitmappages.  Can you try to use pgstattuple extension and get us the
> >> results of Select * from pgstathashindex('index_name');?  If the
> >> number of bitmappages is 128 and total overflow pages are 128 * 4096,
> >> then that would mean that all the pages are used.  Then maybe we can
> >
> > Hmm. Unless I misunderstood that'd mean that overflow_pages/4096 should
> > result in a number <= 128 at the moment, right?
> 
> No, sorry, I think my calculation above has something missing.  It
> should be 128 * 4096 * 8.  How we can compute this number is
> no_bitmap_pages * no_bits_used_to_represent_overflow_pages.

AHA! Ok. Then that appears to match. I get 65.041.

> >If so then something is
> > amiss:
> >
> > # select * from  pgstathashindex('link_datum_id_hash_idx');
> >  version | bucket_pages | overflow_pages | bitmap_pages | unused_pages | live_items | dead_items |   free_percent
> >
---------+--------------+----------------+--------------+--------------+------------+------------+------------------
> >        3 |     10485760 |        2131192 |           66 |            0 | 2975444240 |          0 |
1065.19942179026
> > (1 row)
> >
> > oldmdstash=# select 2131192/4096;
> >  ?column?
> > ----------
> >       520
> > (1 row)
> 
> You need to divide 520 by 8 to get the bitmap page.  Is this the index
> in which you get the error or is this the one on which you have done
> REINDEX?

Post REINDEX.

> > And I do appear to have an odd percentage of free space. :)
> >
> 
> It looks like Vacuum hasn't been triggered.

:(

> > This index was created yesterday so it has been around for maybe 18 hours.
> > Autovac is likely to have hit it by now.
> 
> Do you have any deletes?  How have you verified whether autovacuum has

No DELETEs. Just the initial COPY, then SELECTs, then a DB rename to get it
out of the way of other testing, then the REINDEX.

> been triggered or not?

I just checked pg_stat_user_tables (which I hope is the right place for
this info :)
  relid   | schemaname | relname | seq_scan | seq_tup_read | idx_scan | idx_tup_fetch | n_tup_ins  | n_tup_upd |
n_tup_del| n_tup_hot_upd | n_live_tup | n_dead_tup | n_mod_since_analyze | last_vacuum | last_autovacuum |
last_analyze         |       last_autoanalyze        | vacuum_count | autovacuum_count | analyze_count |
autoanalyze_count

-----------+------------+---------+----------+--------------+----------+---------------+------------+-----------+-----------+---------------+------------+------------+---------------------+-------------+-----------------+-------------------------------+-------------------------------+--------------+------------------+---------------+-------------------129311803
|public     | link    |       70 |  15085880072 |     5779 |        465623 | 2975444240 |         0 |         0 |
     0 |  928658178 |          0 |                   0 |             |                 |
|2017-06-28 10:43:51.273241+10 |            0 |                0 |             0 |                 2
 

So it appears not.

# show autovacuum;autovacuum 
------------on
(1 row)

All autovacuum parameters are as per default. The autovacuum launcher process
exists.

:(

AP

Re: [HACKERS] pgsql 10: hash indexes testing

От

Дата:

05 июля 2017 г., 11:45:05

On Wed, Jul 05, 2017 at 03:33:45PM +1000, AP wrote:
> > Do you have any deletes?  How have you verified whether autovacuum has
> 
> No DELETEs. Just the initial COPY, then SELECTs, then a DB rename to get it
> out of the way of other testing, then the REINDEX.
> 
> > been triggered or not?
> 
> I just checked pg_stat_user_tables (which I hope is the right place for
> this info :)
> 
>    relid   | schemaname | relname | seq_scan | seq_tup_read | idx_scan | idx_tup_fetch | n_tup_ins  | n_tup_upd |
n_tup_del| n_tup_hot_upd | n_live_tup | n_dead_tup | n_mod_since_analyze | last_vacuum | last_autovacuum |
last_analyze         |       last_autoanalyze        | vacuum_count | autovacuum_count | analyze_count |
autoanalyze_count
>
-----------+------------+---------+----------+--------------+----------+---------------+------------+-----------+-----------+---------------+------------+------------+---------------------+-------------+-----------------+-------------------------------+-------------------------------+--------------+------------------+---------------+-------------------
>  129311803 | public     | link    |       70 |  15085880072 |     5779 |        465623 | 2975444240 |         0 |
   0 |             0 |  928658178 |          0 |                   0 |             |                 |
            | 2017-06-28 10:43:51.273241+10 |            0 |                0 |             0 |                 2
 
> 
> So it appears not.

Actually, after a bit more late-arvo thought, I take it this would be the
case as the table has not changed since creation. Thus no need to autovac.

I've newer timestamps on the live db (whose data was uploaded more recently)
for its tables so I think autovac is functioning.

AP

Re: [HACKERS] pgsql 10: hash indexes testing

От

Дата:

05 июля 2017 г., 15:31:40

On Tue, Jul 04, 2017 at 08:23:20PM -0700, Jeff Janes wrote:
> On Tue, Jul 4, 2017 at 3:57 AM, AP <ap@zip.com.au> wrote:
> > The data being indexed is BYTEA, (quasi)random and 64 bytes in size.
> > The table has over 2 billion entries. The data is not unique. There's
> > an average of 10 duplicates for every unique value.
> 
> What is the number of duplicates for the most common value?

Damn. Was going to collect this info as I was doing a fresh upload but
it fell through the cracks of my mind. It'll probably take at least
half a day to collect (a simple count(*) on the table takes 1.5-1.75
hours parallelised across 11 processes) so I'll probably have this in
around 24 hours if all goes well. (and I don't stuff up the SQL :) )

AP.

Re: [HACKERS] pgsql 10: hash indexes testing

От

Amit Kapila

Дата:

05 июля 2017 г., 18:22:32

On Wed, Jul 5, 2017 at 11:03 AM, AP <ap@zip.com.au> wrote:
> On Wed, Jul 05, 2017 at 10:29:09AM +0530, Amit Kapila wrote:
>> >> bitmappages.  Can you try to use pgstattuple extension and get us the
>> >> results of Select * from pgstathashindex('index_name');?  If the
>> >> number of bitmappages is 128 and total overflow pages are 128 * 4096,
>> >> then that would mean that all the pages are used.  Then maybe we can
>> >
>> > Hmm. Unless I misunderstood that'd mean that overflow_pages/4096 should
>> > result in a number <= 128 at the moment, right?
>>
>> No, sorry, I think my calculation above has something missing.  It
>> should be 128 * 4096 * 8.  How we can compute this number is
>> no_bitmap_pages * no_bits_used_to_represent_overflow_pages.
>
> AHA! Ok. Then that appears to match. I get 65.041.
>
>> >If so then something is
>> > amiss:
>> >
>> > # select * from  pgstathashindex('link_datum_id_hash_idx');
>> >  version | bucket_pages | overflow_pages | bitmap_pages | unused_pages | live_items | dead_items |   free_percent
>> >
---------+--------------+----------------+--------------+--------------+------------+------------+------------------
>> >        3 |     10485760 |        2131192 |           66 |            0 | 2975444240 |          0 |
1065.19942179026
>> > (1 row)
>> >
>> > oldmdstash=# select 2131192/4096;
>> >  ?column?
>> > ----------
>> >       520
>> > (1 row)
>>
>> You need to divide 520 by 8 to get the bitmap page.  Is this the index
>> in which you get the error or is this the one on which you have done
>> REINDEX?
>
> Post REINDEX.
>
>> > And I do appear to have an odd percentage of free space. :)

Are you worried about "unused_pages"? If so, then this is not a major
reason to worry, because these are probably freed overflow pages which
can be used in future.  In the hash index, when we free the overflow
pages, they are not returned back to OS, rather they are tracked in
the index as unused pages which will get used when required in future.

>> >
>>
>> It looks like Vacuum hasn't been triggered.
>>

Vacuum won't be triggered on insert load.  I think that is one of the
reasons why in your initial copy, you might have got the error.  We
had some discussion in the past to trigger Vacuum on insert heavy
workloads [1], but the patch still didn't get committed.  I think if
that patch or some other form of that patch gets committed, it will
help the workload what you are trying here.


[1] - https://www.postgresql.org/message-id/b970f20f-f096-2d3a-6c6d-ee887bd30cfb%402ndquadrant.fr

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] pgsql 10: hash indexes testing

От

Дата:

06 июля 2017 г., 03:10:29

On Wed, Jul 05, 2017 at 05:52:32PM +0530, Amit Kapila wrote:
> >> >  version | bucket_pages | overflow_pages | bitmap_pages | unused_pages | live_items | dead_items |
free_percent
> >> >
---------+--------------+----------------+--------------+--------------+------------+------------+------------------
> >> >        3 |     10485760 |        2131192 |           66 |            0 | 2975444240 |          0 |
1065.19942179026
> >> > (1 row)
...
> >> > And I do appear to have an odd percentage of free space. :)
> 
> Are you worried about "unused_pages"? If so, then this is not a major

Nope. "free_percent" Just a bit weird that I have it at over 1000% free. :)
Shouldn't that number be < 100?

> reason to worry, because these are probably freed overflow pages which
> can be used in future.  In the hash index, when we free the overflow
> pages, they are not returned back to OS, rather they are tracked in
> the index as unused pages which will get used when required in future.

> >> It looks like Vacuum hasn't been triggered.
> 
> Vacuum won't be triggered on insert load.  I think that is one of the
> reasons why in your initial copy, you might have got the error.  We
> had some discussion in the past to trigger Vacuum on insert heavy
> workloads [1], but the patch still didn't get committed.  I think if
> that patch or some other form of that patch gets committed, it will
> help the workload what you are trying here.

Well, if this is the cause of my little issue, it might be nice. ATM
my import script bombs out on errors (that I've duplicated! :) It took
11 hours but it bombed) and it sounds like I'll need to do a manual
VACUUM before it can be run again.

The stats you were looking for before are:

# select * from  pgstathashindex('link_datum_id_idx');version | bucket_pages | overflow_pages | bitmap_pages |
unused_pages| live_items | dead_items |   free_percent
 
---------+--------------+----------------+--------------+--------------+------------+------------+------------------
 3 |      8559258 |        4194176 |          128 |      1926502 | 3591812743 |          0 | 942.873199357466
 
(1 row)

# select 4194176.0/128/8;      ?column?        
-----------------------4095.8750000000000000
(1 row)

If you need more info or whatnot, shout. I've a problematic index to
play with now.

> [1] - https://www.postgresql.org/message-id/b970f20f-f096-2d3a-6c6d-ee887bd30cfb%402ndquadrant.fr

AP

Re: [HACKERS] pgsql 10: hash indexes testing

От

Amit Kapila

Дата:

06 июля 2017 г., 09:22:03

On Thu, Jul 6, 2017 at 2:40 AM, AP <ap@zip.com.au> wrote:
> On Wed, Jul 05, 2017 at 05:52:32PM +0530, Amit Kapila wrote:
>> >> >  version | bucket_pages | overflow_pages | bitmap_pages | unused_pages | live_items | dead_items |
free_percent
>> >> >
---------+--------------+----------------+--------------+--------------+------------+------------+------------------
>> >> >        3 |     10485760 |        2131192 |           66 |            0 | 2975444240 |          0 |
1065.19942179026
>> >> > (1 row)
> ...
>> >> > And I do appear to have an odd percentage of free space. :)
>>
>> Are you worried about "unused_pages"? If so, then this is not a major
>
> Nope. "free_percent" Just a bit weird that I have it at over 1000% free. :)
> Shouldn't that number be < 100?
>

Yes, there seems to be some gotcha in free percent calculation.  Is it
possible for you to debug or in some way share the test?

>> reason to worry, because these are probably freed overflow pages which
>> can be used in future.  In the hash index, when we free the overflow
>> pages, they are not returned back to OS, rather they are tracked in
>> the index as unused pages which will get used when required in future.
>
>> >> It looks like Vacuum hasn't been triggered.
>>
>> Vacuum won't be triggered on insert load.  I think that is one of the
>> reasons why in your initial copy, you might have got the error.  We
>> had some discussion in the past to trigger Vacuum on insert heavy
>> workloads [1], but the patch still didn't get committed.  I think if
>> that patch or some other form of that patch gets committed, it will
>> help the workload what you are trying here.
>
> Well, if this is the cause of my little issue, it might be nice. ATM
> my import script bombs out on errors (that I've duplicated! :) It took
> 11 hours but it bombed) and it sounds like I'll need to do a manual
> VACUUM before it can be run again.
>

Yeah, I think after manual vacuum you should be able to proceed.

> The stats you were looking for before are:
>
> # select * from  pgstathashindex('link_datum_id_idx');
>  version | bucket_pages | overflow_pages | bitmap_pages | unused_pages | live_items | dead_items |   free_percent
> ---------+--------------+----------------+--------------+--------------+------------+------------+------------------
>        3 |      8559258 |        4194176 |          128 |      1926502 | 3591812743 |          0 | 942.873199357466
> (1 row)
>
> # select 4194176.0/128/8;
>        ?column?
> -----------------------
>  4095.8750000000000000
> (1 row)
>

From above stats, it is clear that you have hit the maximum number of
overflow pages we can support today.  Now, here one can argue that we
should increase the limit of overflow pages in hash index which we can
do, but I think you can again hit such a problem after some more time.
So at this stage, there are two possibilities for you (a) run manual
Vacuum in-between (b) create the index after bulk load.  In general,
whatever I have mentioned in (b) is a better way for bulk loading.
Note here again the free_percent seems to be wrong.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] pgsql 10: hash indexes testing

От

Дата:

06 июля 2017 г., 10:02:21

On Thu, Jul 06, 2017 at 08:52:03AM +0530, Amit Kapila wrote:
> On Thu, Jul 6, 2017 at 2:40 AM, AP <ap@zip.com.au> wrote:
> > On Wed, Jul 05, 2017 at 05:52:32PM +0530, Amit Kapila wrote:
> >> >> >  version | bucket_pages | overflow_pages | bitmap_pages | unused_pages | live_items | dead_items |
free_percent
> >> >> >
---------+--------------+----------------+--------------+--------------+------------+------------+------------------
> >> >> >        3 |     10485760 |        2131192 |           66 |            0 | 2975444240 |          0 |
1065.19942179026
> >> >> > (1 row)
> > ...
> >> >> > And I do appear to have an odd percentage of free space. :)
> >>
> >> Are you worried about "unused_pages"? If so, then this is not a major
> >
> > Nope. "free_percent" Just a bit weird that I have it at over 1000% free. :)
> > Shouldn't that number be < 100?
> 
> Yes, there seems to be some gotcha in free percent calculation.  Is it
> possible for you to debug or in some way share the test?

I can try to debug but I need to know what to look for and how. If it
requires data reloads then that's around 12-15 hours per hit.

As for sharing the test, that'd mean sharing the data. If it helps I can
provide the content of that column but you're looking at an sql dump that
is roughly (2*64+1)*2.3 billion (give or take a (few) billion) in size. :)

> > Well, if this is the cause of my little issue, it might be nice. ATM
> > my import script bombs out on errors (that I've duplicated! :) It took
> > 11 hours but it bombed) and it sounds like I'll need to do a manual
> > VACUUM before it can be run again.
> >
> 
> Yeah, I think after manual vacuum you should be able to proceed.

I don't think that'll help. I did a small check which I hope is helpful in
seeing if it will. Working off a similar db that completed (as it was
smaller and I did not want to mess with my one copy of the broken db)
I got the following results:

Pre-VACUUM:
-----------
# \di+                               List of relationsSchema |       Name        | Type  |   Owner   |  Table  |  Size
| Description

--------+-------------------+-------+-----------+---------+---------+-------------
...public | link_datum_id_idx | index | mdkingpin | link    | 90 GB   |
...

# select * from  pgstathashindex('link_datum_id_idx');version | bucket_pages | overflow_pages | bitmap_pages |
unused_pages| live_items | dead_items |  free_percent

---------+--------------+----------------+--------------+--------------+------------+------------+-----------------
3 |      7611595 |        3451261 |          106 |       777013 | 3076131325 |          0 | 1512.8635780908

# vacuum VERBOSE ANALYZE link;
INFO:  vacuuming "public.link"
INFO:  "link": found 0 removable, 2272156152 nonremovable row versions in 120507771 out of 123729466 pages
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 8594
There were 0 unused item pointers.
Skipped 0 pages due to buffer pins, 0 frozen pages.
0 pages are entirely empty.
CPU: user: 457.16 s, system: 755.84 s, elapsed: 4196.75 s.
INFO:  vacuuming "pg_toast.pg_toast_183727891"
INFO:  index "pg_toast_183727891_index" now contains 1441820 row versions in 3956 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU: user: 0.02 s, system: 0.00 s, elapsed: 0.56 s.
INFO:  "pg_toast_183727891": found 0 removable, 1441820 nonremovable row versions in 332271 out of 332271 pages
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 8594
There were 0 unused item pointers.
Skipped 0 pages due to buffer pins, 0 frozen pages.
0 pages are entirely empty.
CPU: user: 1.16 s, system: 2.26 s, elapsed: 22.80 s.
INFO:  analyzing "public.link"
INFO:  "link": scanned 3000000 of 123729466 pages, containing 56661337 live rows and 0 dead rows; 3000000 rows in
sample,2330296882 estimated total rows

VACUUM    
Time: 7057484.079 ms (01:57:37.484)

Post-VACUUM:
------------
# \di+Schema |       Name        | Type  |   Owner   |  Table  |  Size   | Description
--------+-------------------+-------+-----------+---------+---------+-------------public | link_datum_id_idx | index |
mdkingpin| link    | 90 GB   |

# select * from  pgstathashindex('link_datum_id_idx');version | bucket_pages | overflow_pages | bitmap_pages |
unused_pages| live_items | dead_items |  free_percent

---------+--------------+----------------+--------------+--------------+------------+------------+-----------------
3 |      7611595 |        3451261 |          106 |       777013 | 3076131325 |          0 | 1512.8635780908

The end results are the same.

Then I did:
# CREATE INDEX CONCURRENTLY ON link USING hash (datum_id) WITH (fillfactor = 90);
CREATE INDEX
Time: 12545612.560 ms (03:29:05.613)
# \di+                                                   List of relationsSchema |        Name        | Type  |   Owner
 |  Table  |  Size   | Description

--------+--------------------+-------+-----------+---------+---------+-------------public | link_datum_id_idx  | index
|mdkingpin | link    | 90 GB   |public | link_datum_id_idx1 | index | mdkingpin | link    | 71 GB   |

# select * from  pgstathashindex('link_datum_id_idx1');version | bucket_pages | overflow_pages | bitmap_pages |
unused_pages| live_items | dead_items |   free_percent

---------+--------------+----------------+--------------+--------------+------------+------------+------------------
 3 |      7340032 |        2014699 |           62 |            0 | 2336759432 |          0 | 909.757757369414

That's markedly different. At a rough estimate I should be able to double
the row count before I hit the "real" limits of a hash index.

When you refer to VACUUM do you mean VACUUM FULL? That's going to get nasty
if that's the case as the table is almost 1TB in size.

> >From above stats, it is clear that you have hit the maximum number of
> overflow pages we can support today.  Now, here one can argue that we
> should increase the limit of overflow pages in hash index which we can
> do, but I think you can again hit such a problem after some more time.

True, though I'm not far off hitting it "for real" at the present limits
so an increase would be of benefit in other respects (ie not needing to
have as many tables to manage because we have to bundle a set off due
to index limits).

> So at this stage, there are two possibilities for you (a) run manual
> Vacuum in-between (b) create the index after bulk load.  In general,
> whatever I have mentioned in (b) is a better way for bulk loading.
> Note here again the free_percent seems to be wrong.

If you didn't mean VACUUM FULL then (a) does not appear to work and (b)
would kill usability of the db during import, which would happen daily
(though with a vastly reduced data size). It also messes with the 
permission model that has been set up for the least-trusted section of
the project (at the moment that section can only INSERT).

Still, I may wind up going with (b) if a VACUUM FULL is the only other
real choice but would prefer to avoid it. The fact that the index is
around 300GB smaller (so far) than btree may well be worth the pain
all by its lonesome.

AP

Re: [HACKERS] pgsql 10: hash indexes testing

От

Amit Kapila

Дата:

06 июля 2017 г., 13:08:38

On Thu, Jul 6, 2017 at 9:32 AM, AP <ap@zip.com.au> wrote:
> On Thu, Jul 06, 2017 at 08:52:03AM +0530, Amit Kapila wrote:
>> On Thu, Jul 6, 2017 at 2:40 AM, AP <ap@zip.com.au> wrote:
>> > On Wed, Jul 05, 2017 at 05:52:32PM +0530, Amit Kapila wrote:
>> >> >> >  version | bucket_pages | overflow_pages | bitmap_pages | unused_pages | live_items | dead_items |
free_percent
>> >> >> >
---------+--------------+----------------+--------------+--------------+------------+------------+------------------
>> >> >> >        3 |     10485760 |        2131192 |           66 |            0 | 2975444240 |          0 |
1065.19942179026
>> >> >> > (1 row)
>> > ...
>> >> >> > And I do appear to have an odd percentage of free space. :)
>> >>
>> >> Are you worried about "unused_pages"? If so, then this is not a major
>> >
>> > Nope. "free_percent" Just a bit weird that I have it at over 1000% free. :)
>> > Shouldn't that number be < 100?
>>
>> Yes, there seems to be some gotcha in free percent calculation.  Is it
>> possible for you to debug or in some way share the test?
>
> I can try to debug but I need to know what to look for and how.
>

Okay,  you need to debug function pgstathashindex and have your
breakpoint at free_percent calculation, then try to get the values of
nblocks, all the values in stats struct and total_space.  I think
after getting this info we can further decide what to look for.


> If it
> requires data reloads then that's around 12-15 hours per hit.
>
> As for sharing the test, that'd mean sharing the data. If it helps I can
> provide the content of that column but you're looking at an sql dump that
> is roughly (2*64+1)*2.3 billion (give or take a (few) billion) in size. :)
>

This is tricky, will Ibe able to import that column values by creating
table, if so, then probably it is worth.

>> > Well, if this is the cause of my little issue, it might be nice. ATM
>> > my import script bombs out on errors (that I've duplicated! :) It took
>> > 11 hours but it bombed) and it sounds like I'll need to do a manual
>> > VACUUM before it can be run again.
>> >
>>
>> Yeah, I think after manual vacuum you should be able to proceed.
>
>
> That's markedly different. At a rough estimate I should be able to double
> the row count before I hit the "real" limits of a hash index.
>
> When you refer to VACUUM do you mean VACUUM FULL?
>

Normal Vauum won't work for this case as you don't have dead tuples
(deleted rows in table).

> That's going to get nasty
> if that's the case as the table is almost 1TB in size.
>

Yeah,  I think for this situation REINDEX will be a better option
because anyway Vacuum Full will rewrite the entire index and heap.

>> >From above stats, it is clear that you have hit the maximum number of
>> overflow pages we can support today.  Now, here one can argue that we
>> should increase the limit of overflow pages in hash index which we can
>> do, but I think you can again hit such a problem after some more time.
>
> True, though I'm not far off hitting it "for real" at the present limits
> so an increase would be of benefit in other respects (ie not needing to
> have as many tables to manage because we have to bundle a set off due
> to index limits).
>
>> So at this stage, there are two possibilities for you (a) run manual
>> Vacuum in-between (b) create the index after bulk load.  In general,
>> whatever I have mentioned in (b) is a better way for bulk loading.
>> Note here again the free_percent seems to be wrong.
>
> If you didn't mean VACUUM FULL then (a) does not appear to work and (b)
> would kill usability of the db during import, which would happen daily
> (though with a vastly reduced data size).
>

If the data size in subsequent import is very less, then you only need
to Create the index after first import and then let it continue like
that.

> It also messes with the
> permission model that has been set up for the least-trusted section of
> the project (at the moment that section can only INSERT).
>

As per your permission model Vacuum Full is allowed, but not Create index?

> Still, I may wind up going with (b) if a VACUUM FULL is the only other
> real choice but would prefer to avoid it. The fact that the index is
> around 300GB smaller (so far) than btree may well be worth the pain
> all by its lonesome.
>

As mentioned above REINDEX might be a better option.  I think for such
situation we should have some provision to allow squeeze functionality
of hash exposed to the user, this will be less costly than REINDEX and
might serve the purpose for the user.  Hey, can you try some hack in
the code which can at least tell us whether squeezing hash can give us
any benefit or is the index in such a situation that only REINDEX will
help.  Basically, while doing Vacuum (non-full), you need to somehow
bypass the below line in lazy_scan_heap
lazy_scan_heap
{
..
if (vacrelstats->num_dead_tuples > 0)
..
}

I am not sure it will work, but we can try once if you are okay.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] pgsql 10: hash indexes testing

От

Дата:

06 июля 2017 г., 17:34:55

On Thu, Jul 06, 2017 at 12:38:38PM +0530, Amit Kapila wrote:
> On Thu, Jul 6, 2017 at 9:32 AM, AP <ap@zip.com.au> wrote:
> > On Thu, Jul 06, 2017 at 08:52:03AM +0530, Amit Kapila wrote:
> >> On Thu, Jul 6, 2017 at 2:40 AM, AP <ap@zip.com.au> wrote:
> >> > On Wed, Jul 05, 2017 at 05:52:32PM +0530, Amit Kapila wrote:
> >> >> >> >  version | bucket_pages | overflow_pages | bitmap_pages | unused_pages | live_items | dead_items |
free_percent
> >> >> >> >
---------+--------------+----------------+--------------+--------------+------------+------------+------------------
> >> >> >> >        3 |     10485760 |        2131192 |           66 |            0 | 2975444240 |          0 |
1065.19942179026
> >> >> >> > (1 row)
> >> > ...
> >> >> >> > And I do appear to have an odd percentage of free space. :)
> >> >>
> >> >> Are you worried about "unused_pages"? If so, then this is not a major
> >> >
> >> > Nope. "free_percent" Just a bit weird that I have it at over 1000% free. :)
> >> > Shouldn't that number be < 100?
> >>
> >> Yes, there seems to be some gotcha in free percent calculation.  Is it
> >> possible for you to debug or in some way share the test?
> >
> > I can try to debug but I need to know what to look for and how.
> 
> Okay,  you need to debug function pgstathashindex and have your
> breakpoint at free_percent calculation, then try to get the values of
> nblocks, all the values in stats struct and total_space.  I think
> after getting this info we can further decide what to look for.

Ok. I'll try and get to this tomorrow amidst fun with NFS. Hopefully
there'll be time.

So... I'll need

postgresql-10-dbg - debug symbols for postgresql-10

Then given https://doxygen.postgresql.org/pgstatindex_8c.html#af86e3b4c40779d4f30a73b0bfe06316f
set a breakpoint at pgstatindex.c:710 via gdb and then have fun with
print?

> > As for sharing the test, that'd mean sharing the data. If it helps I can
> > provide the content of that column but you're looking at an sql dump that
> > is roughly (2*64+1)*2.3 billion (give or take a (few) billion) in size. :)
> 
> This is tricky, will Ibe able to import that column values by creating
> table, if so, then probably it is worth.

Should do. Thinking about it a little more, I can shrink the file down by
roughly half if I don't do a pg_dump or similar. Doing

COPY link (datum_id) TO '/tmp/moocow.copy' WITH (FORMAT BINARY)

should allow you to use COPY FROM to restore the file and produce something
a lot smaller than a dump, right?

The table is simple:

CREATE TABLE link (datum_id BYTEA);

I can't give you the rest of the table (one other column) as the stuff hidden
in there is private.

The only thing that wont give you is the manner in which the column is filled
(ie: the transactions, their size, how long they run, their concurrency etc).
Don't know if that's important.

> >> So at this stage, there are two possibilities for you (a) run manual
> >> Vacuum in-between (b) create the index after bulk load.  In general,
> >> whatever I have mentioned in (b) is a better way for bulk loading.
> >> Note here again the free_percent seems to be wrong.
> >
> > If you didn't mean VACUUM FULL then (a) does not appear to work and (b)
> > would kill usability of the db during import, which would happen daily
> > (though with a vastly reduced data size).
> 
> If the data size in subsequent import is very less, then you only need
> to Create the index after first import and then let it continue like
> that.

Managing this has suddenly gotten a lot more complex. :)

> > It also messes with the
> > permission model that has been set up for the least-trusted section of
> > the project (at the moment that section can only INSERT).
> 
> As per your permission model Vacuum Full is allowed, but not Create index?

Well, at the moment it's all in development (though time for that to end
is coming up). As such I can do things with enhanced permissions manually.

When it hits production, that rather stops.

> As mentioned above REINDEX might be a better option.  I think for such
> situation we should have some provision to allow squeeze functionality
> of hash exposed to the user, this will be less costly than REINDEX and
> might serve the purpose for the user.  Hey, can you try some hack in

Assuming it does help, would this be something one would need to guess
at? "I did a whole bunch of concurrent INSERT heavy transactions so I
guess I should do a squeeze now"?

Or could it be figured out programmatically?

> the code which can at least tell us whether squeezing hash can give us
> any benefit or is the index in such a situation that only REINDEX will
> help.  Basically, while doing Vacuum (non-full), you need to somehow
> bypass the below line in lazy_scan_heap
> lazy_scan_heap
> {
> ..
> if (vacrelstats->num_dead_tuples > 0)
> ..
> }
> 
> I am not sure it will work, but we can try once if you are okay.

So this would be at L1284 of https://doxygen.postgresql.org/vacuumlazy_8c.html#a59a677fb19b0aae6a57c87ae073e081b ?

Would turning it into if (1) and recompiling be the way to go?

Not sure if I'd be able to do this before or after the weekend.

AP

Re: [HACKERS] pgsql 10: hash indexes testing

От

Amit Kapila

Дата:

06 июля 2017 г., 17:49:59

On Thu, Jul 6, 2017 at 5:04 PM, AP <ap@zip.com.au> wrote:
> On Thu, Jul 06, 2017 at 12:38:38PM +0530, Amit Kapila wrote:
>> On Thu, Jul 6, 2017 at 9:32 AM, AP <ap@zip.com.au> wrote:
>> > On Thu, Jul 06, 2017 at 08:52:03AM +0530, Amit Kapila wrote:
>> >> On Thu, Jul 6, 2017 at 2:40 AM, AP <ap@zip.com.au> wrote:
>> >> > On Wed, Jul 05, 2017 at 05:52:32PM +0530, Amit Kapila wrote:
>> >> >> >> >  version | bucket_pages | overflow_pages | bitmap_pages | unused_pages | live_items | dead_items |
free_percent
>> >> >> >> >
---------+--------------+----------------+--------------+--------------+------------+------------+------------------
>> >> >> >> >        3 |     10485760 |        2131192 |           66 |            0 | 2975444240 |          0 |
1065.19942179026
>> >> >> >> > (1 row)
>> >> > ...
>> >> >> >> > And I do appear to have an odd percentage of free space. :)
>> >> >>
>> >> >> Are you worried about "unused_pages"? If so, then this is not a major
>> >> >
>> >> > Nope. "free_percent" Just a bit weird that I have it at over 1000% free. :)
>> >> > Shouldn't that number be < 100?
>> >>
>> >> Yes, there seems to be some gotcha in free percent calculation.  Is it
>> >> possible for you to debug or in some way share the test?
>> >
>> > I can try to debug but I need to know what to look for and how.
>>
>> Okay,  you need to debug function pgstathashindex and have your
>> breakpoint at free_percent calculation, then try to get the values of
>> nblocks, all the values in stats struct and total_space.  I think
>> after getting this info we can further decide what to look for.
>
> Ok. I'll try and get to this tomorrow amidst fun with NFS. Hopefully
> there'll be time.
>

Cool.

> So... I'll need
>
> postgresql-10-dbg - debug symbols for postgresql-10
>
> Then given https://doxygen.postgresql.org/pgstatindex_8c.html#af86e3b4c40779d4f30a73b0bfe06316f
> set a breakpoint at pgstatindex.c:710 via gdb and then have fun with
> print?
>

If I am reading it correctly it should be line 706 as below:
if (get_call_result_type(fcinfo, NULL, &tupleDesc) != TYPEFUNC_COMPOSITE)

>> > As for sharing the test, that'd mean sharing the data. If it helps I can
>> > provide the content of that column but you're looking at an sql dump that
>> > is roughly (2*64+1)*2.3 billion (give or take a (few) billion) in size. :)
>>
>> This is tricky, will Ibe able to import that column values by creating
>> table, if so, then probably it is worth.
>
> Should do. Thinking about it a little more, I can shrink the file down by
> roughly half if I don't do a pg_dump or similar. Doing
>
> COPY link (datum_id) TO '/tmp/moocow.copy' WITH (FORMAT BINARY)
>
> should allow you to use COPY FROM to restore the file and produce something
> a lot smaller than a dump, right?
>

I think so.  You can share it. I will try.

> The table is simple:
>
> CREATE TABLE link (datum_id BYTEA);
>
> I can't give you the rest of the table (one other column) as the stuff hidden
> in there is private.
>

No issues.

> The only thing that wont give you is the manner in which the column is filled
> (ie: the transactions, their size, how long they run, their concurrency etc).
> Don't know if that's important.
>
>> >> So at this stage, there are two possibilities for you (a) run manual
>> >> Vacuum in-between (b) create the index after bulk load.  In general,
>> >> whatever I have mentioned in (b) is a better way for bulk loading.
>> >> Note here again the free_percent seems to be wrong.
>> >
>> > If you didn't mean VACUUM FULL then (a) does not appear to work and (b)
>> > would kill usability of the db during import, which would happen daily
>> > (though with a vastly reduced data size).
>>
>> If the data size in subsequent import is very less, then you only need
>> to Create the index after first import and then let it continue like
>> that.
>
> Managing this has suddenly gotten a lot more complex. :)
>
>> > It also messes with the
>> > permission model that has been set up for the least-trusted section of
>> > the project (at the moment that section can only INSERT).
>>
>> As per your permission model Vacuum Full is allowed, but not Create index?
>
> Well, at the moment it's all in development (though time for that to end
> is coming up). As such I can do things with enhanced permissions manually.
>

I think if you are under development, it is always advisable to create
indexes after initial bulk load.  That way it will be faster and will
take lesser space atleast in case of hash index.

> When it hits production, that rather stops.
>
>> As mentioned above REINDEX might be a better option.  I think for such
>> situation we should have some provision to allow squeeze functionality
>> of hash exposed to the user, this will be less costly than REINDEX and
>> might serve the purpose for the user.  Hey, can you try some hack in
>
> Assuming it does help, would this be something one would need to guess
> at? "I did a whole bunch of concurrent INSERT heavy transactions so I
> guess I should do a squeeze now"?
>
> Or could it be figured out programmatically?
>

I think one can refer free_percent and number of overflow pages to
perform such a command.  It won't be 100% correct, but we can make a
guess.  We can even check free space in overflow pages with page
inspect to make it more accurate.

>> the code which can at least tell us whether squeezing hash can give us
>> any benefit or is the index in such a situation that only REINDEX will
>> help.  Basically, while doing Vacuum (non-full), you need to somehow
>> bypass the below line in lazy_scan_heap
>> lazy_scan_heap
>> {
>> ..
>> if (vacrelstats->num_dead_tuples > 0)
>> ..
>> }
>>
>> I am not sure it will work, but we can try once if you are okay.
>
> So this would be at L1284 of https://doxygen.postgresql.org/vacuumlazy_8c.html#a59a677fb19b0aae6a57c87ae073e081b ?
>
> Would turning it into if (1) and recompiling be the way to go?
>

Yes.

> Not sure if I'd be able to do this before or after the weekend.
>

No problem.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] pgsql 10: hash indexes testing

От

Дата:

06 июля 2017 г., 19:10:34

On Wed, Jul 05, 2017 at 07:31:39PM +1000, AP wrote:
> On Tue, Jul 04, 2017 at 08:23:20PM -0700, Jeff Janes wrote:
> > On Tue, Jul 4, 2017 at 3:57 AM, AP <ap@zip.com.au> wrote:
> > > The data being indexed is BYTEA, (quasi)random and 64 bytes in size.
> > > The table has over 2 billion entries. The data is not unique. There's
> > > an average of 10 duplicates for every unique value.
> > 
> > What is the number of duplicates for the most common value?
> 
> Damn. Was going to collect this info as I was doing a fresh upload but
> it fell through the cracks of my mind. It'll probably take at least
> half a day to collect (a simple count(*) on the table takes 1.5-1.75
> hours parallelised across 11 processes) so I'll probably have this in
> around 24 hours if all goes well. (and I don't stuff up the SQL :) )

Well...
num_ids |  count   
---------+----------      1 | 91456442      2 | 56224976      4 | 14403515     16 | 13665967      3 | 12929363     17 |
12093367    15 | 10347006
 


So the most common is a unique value, then a dupe.

AP.

Re: [HACKERS] pgsql 10: hash indexes testing

От

Дата:

07 июля 2017 г., 08:52:18

On Thu, Jul 06, 2017 at 05:19:59PM +0530, Amit Kapila wrote:
> I think if you are under development, it is always advisable to create
> indexes after initial bulk load.  That way it will be faster and will
> take lesser space atleast in case of hash index.

This is a bit of a pickle, actually:
* if I do have a hash index I'll wind up with a bloated one at some stage that refused to allow more inserts until the
indexis re-created

* if I don't have an index then I'll wind up with a table where I cannot create a hash index because it has too many
rowsfor it to handle

I'm at a bit of a loss as to how to deal with this. The DB design does come
with a kind of partitioning where a bundle of tables get put off to the side
and searched seperately as needed but too many of those and the impact on
performance can be noticed so I need to minimise them.

> >> As mentioned above REINDEX might be a better option.  I think for such
> >> situation we should have some provision to allow squeeze functionality
> >> of hash exposed to the user, this will be less costly than REINDEX and
> >> might serve the purpose for the user.  Hey, can you try some hack in
> >
> > Assuming it does help, would this be something one would need to guess
> > at? "I did a whole bunch of concurrent INSERT heavy transactions so I
> > guess I should do a squeeze now"?
> >
> > Or could it be figured out programmatically?
> 
> I think one can refer free_percent and number of overflow pages to
> perform such a command.  It won't be 100% correct, but we can make a
> guess.  We can even check free space in overflow pages with page
> inspect to make it more accurate.

Does this take much time? Main reason I am asking is that this looks like
something that the db ought to handle underneath (say as part of an autovac
run) and so if there are stats that the index code can maintain that can
then be used by the autovac (or something) code to trigger a cleanup this
I think would be of benefit.

Unless I am being /so/ unusual that it's not worth it. :)

I'll reply to the rest in a separate stream as I'm still poking other
work related things atm so can't do the debug testing as yet.

AP

Re: [HACKERS] pgsql 10: hash indexes testing

От

Amit Kapila

Дата:

07 июля 2017 г., 18:28:25

On Fri, Jul 7, 2017 at 8:22 AM, AP <ap@zip.com.au> wrote:
> On Thu, Jul 06, 2017 at 05:19:59PM +0530, Amit Kapila wrote:
>> I think if you are under development, it is always advisable to create
>> indexes after initial bulk load.  That way it will be faster and will
>> take lesser space atleast in case of hash index.
>
> This is a bit of a pickle, actually:
> * if I do have a hash index I'll wind up with a bloated one at some stage
>   that refused to allow more inserts until the index is re-created
> * if I don't have an index then I'll wind up with a table where I cannot
>   create a hash index because it has too many rows for it to handle
>
> I'm at a bit of a loss as to how to deal with this.
>

I can understand your concerns.  To address first concern we need to
work on one or more of following work items: (a) work on vacuums that
can be triggered on insert only workload (it should perform index
vacuum as well) (b) separate utility statement/function to squeeze
hash index (c) db internally does squeezing like after each split, so
that chances of such a problem can be reduced, but that will be at the
cost of performance reduction in other workloads, so not sure if it is
advisable.  Among these (b) is simplest to do but may not be
convenient for the user.

To address your second concern, we need to speed up the creation of
hash index which is a relatively big project.  Having said that, I
think in your case, this is one-time operation so spending once more
time might be okay.

>
>> >> As mentioned above REINDEX might be a better option.  I think for such
>> >> situation we should have some provision to allow squeeze functionality
>> >> of hash exposed to the user, this will be less costly than REINDEX and
>> >> might serve the purpose for the user.  Hey, can you try some hack in
>> >
>> > Assuming it does help, would this be something one would need to guess
>> > at? "I did a whole bunch of concurrent INSERT heavy transactions so I
>> > guess I should do a squeeze now"?
>> >
>> > Or could it be figured out programmatically?
>>
>> I think one can refer free_percent and number of overflow pages to
>> perform such a command.  It won't be 100% correct, but we can make a
>> guess.  We can even check free space in overflow pages with page
>> inspect to make it more accurate.
>
> Does this take much time? Main reason I am asking is that this looks like
> something that the db ought to handle underneath (say as part of an autovac
> run) and so if there are stats that the index code can maintain that can
> then be used by the autovac (or something) code to trigger a cleanup this
> I think would be of benefit.
>

Sure, I agree that database should automatically handle bloat, but as
said above this will be a separate project and may not be straight
forward.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] pgsql 10: hash indexes testing

От

Дата:

11 июля 2017 г., 07:21:56

On Fri, Jul 07, 2017 at 05:58:25PM +0530, Amit Kapila wrote:
> On Fri, Jul 7, 2017 at 8:22 AM, AP <ap@zip.com.au> wrote:
> > On Thu, Jul 06, 2017 at 05:19:59PM +0530, Amit Kapila wrote:
> >> I think if you are under development, it is always advisable to create
> >> indexes after initial bulk load.  That way it will be faster and will
> >> take lesser space atleast in case of hash index.
> >
> > This is a bit of a pickle, actually:
> > * if I do have a hash index I'll wind up with a bloated one at some stage
> >   that refused to allow more inserts until the index is re-created
> > * if I don't have an index then I'll wind up with a table where I cannot
> >   create a hash index because it has too many rows for it to handle
> >
> > I'm at a bit of a loss as to how to deal with this.
> 
> I can understand your concerns.  To address first concern we need to
> work on one or more of following work items: (a) work on vacuums that
> can be triggered on insert only workload (it should perform index
> vacuum as well) (b) separate utility statement/function to squeeze
> hash index (c) db internally does squeezing like after each split, so
> that chances of such a problem can be reduced, but that will be at the
> cost of performance reduction in other workloads, so not sure if it is
> advisable.  Among these (b) is simplest to do but may not be
> convenient for the user.

(a) seems like a good compromise on (c) if it can be done without disruption   and in time.
(b) seems analogous to the path autovcauum took. Unless I misremember, before   autovacuum we had a cronjob to do
similarwork. It's probably a sane path   to take as a first step on the way to (a)

(c) may not be worth the effort if it compromises general use, though perhaps   it could be used to indicate to (a)
thatnow is a good time to handle   this bit?

> To address your second concern, we need to speed up the creation of
> hash index which is a relatively big project.  Having said that, I
> think in your case, this is one-time operation so spending once more
> time might be okay.

Yup. Primarily I just wanted the idea out there that this isn't that easy
to cope with manually and to get it onto a todo list (unless it was an
easy thing to do given a bit of thought but it appears not).

Out of curiosity, and apologies if you explained it already and I missed
the signficance of the words, how does this bloat happen? There tables
obly cop COPY. There is no UPDATE or DELETE; all transactions get COMMITted
so there's no ROLLBACK undoing the COPY and yet the bloat occurs.

AP

Re: [HACKERS] pgsql 10: hash indexes testing

От

Amit Kapila

Дата:

11 июля 2017 г., 08:12:38

On Tue, Jul 11, 2017 at 6:51 AM, AP <ap@zip.com.au> wrote:
> On Fri, Jul 07, 2017 at 05:58:25PM +0530, Amit Kapila wrote:
>> On Fri, Jul 7, 2017 at 8:22 AM, AP <ap@zip.com.au> wrote:
>> > On Thu, Jul 06, 2017 at 05:19:59PM +0530, Amit Kapila wrote:
>> >> I think if you are under development, it is always advisable to create
>> >> indexes after initial bulk load.  That way it will be faster and will
>> >> take lesser space atleast in case of hash index.
>> >
>> > This is a bit of a pickle, actually:
>> > * if I do have a hash index I'll wind up with a bloated one at some stage
>> >   that refused to allow more inserts until the index is re-created
>> > * if I don't have an index then I'll wind up with a table where I cannot
>> >   create a hash index because it has too many rows for it to handle
>> >
>> > I'm at a bit of a loss as to how to deal with this.
>>
>> I can understand your concerns.  To address first concern we need to
>> work on one or more of following work items: (a) work on vacuums that
>> can be triggered on insert only workload (it should perform index
>> vacuum as well) (b) separate utility statement/function to squeeze
>> hash index (c) db internally does squeezing like after each split, so
>> that chances of such a problem can be reduced, but that will be at the
>> cost of performance reduction in other workloads, so not sure if it is
>> advisable.  Among these (b) is simplest to do but may not be
>> convenient for the user.
>
> (a) seems like a good compromise on (c) if it can be done without disruption
>     and in time.
> (b) seems analogous to the path autovcauum took. Unless I misremember, before
>     autovacuum we had a cronjob to do similar work. It's probably a sane path
>     to take as a first step on the way to (a)
> (c) may not be worth the effort if it compromises general use, though perhaps
>     it could be used to indicate to (a) that now is a good time to handle
>     this bit?
>

Nice summarization!  I think before doing anything of that sort we
need opinions from others as well.  If some other community members
also see value in doing one or multiple of above things, then I can
write a patch.

>> To address your second concern, we need to speed up the creation of
>> hash index which is a relatively big project.  Having said that, I
>> think in your case, this is one-time operation so spending once more
>> time might be okay.
>
> Yup. Primarily I just wanted the idea out there that this isn't that easy
> to cope with manually and to get it onto a todo list (unless it was an
> easy thing to do given a bit of thought but it appears not).
>
> Out of curiosity, and apologies if you explained it already and I missed
> the signficance of the words, how does this bloat happen?
>

You might want to read src/backend/access/hash/README.  During split
operation, we copy tuples from the old bucket (bucket being split) to
new bucket (bucket being populated) and once all the tuples are copied
and there is no prior scan left which has started during split on the
buckets involved in the split, we remove the tuples from the old
bucket.  Now, as we might need to wait for the scans to finish, we
have preferred to perform it during vacuum or during next split from
that bucket.  Till the tuples are removed from the old bucket, there
will be some bloat in the system.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] pgsql 10: hash indexes testing

От

Alvaro Herrera

Дата:

11 июля 2017 г., 08:40:35

Amit Kapila wrote:
> On Tue, Jul 11, 2017 at 6:51 AM, AP <ap@zip.com.au> wrote:
> > On Fri, Jul 07, 2017 at 05:58:25PM +0530, Amit Kapila wrote:

> >> I can understand your concerns.  To address first concern we need to
> >> work on one or more of following work items: (a) work on vacuums that
> >> can be triggered on insert only workload (it should perform index
> >> vacuum as well) (b) separate utility statement/function to squeeze
> >> hash index (c) db internally does squeezing like after each split, so
> >> that chances of such a problem can be reduced, but that will be at the
> >> cost of performance reduction in other workloads, so not sure if it is
> >> advisable.  Among these (b) is simplest to do but may not be
> >> convenient for the user.
> >
> > (a) seems like a good compromise on (c) if it can be done without disruption
> >     and in time.
> > (b) seems analogous to the path autovcauum took. Unless I misremember, before
> >     autovacuum we had a cronjob to do similar work. It's probably a sane path
> >     to take as a first step on the way to (a)
> > (c) may not be worth the effort if it compromises general use, though perhaps
> >     it could be used to indicate to (a) that now is a good time to handle
> >     this bit?
> 
> Nice summarization!  I think before doing anything of that sort we
> need opinions from others as well.  If some other community members
> also see value in doing one or multiple of above things, then I can
> write a patch.

I haven't read the thread, but in late PG10 autovacuum gained the idea
of "work items" that can be plugged from other parts of the server;
currently BRIN uses it to cause a range to be summarized right after the
next one starts being filled.  This is activated separately for each
index via a reloption.  Perhaps something like that can be used for hash
indexes?  See commit 7526e10224f0792201e99631567bbe44492bbde4.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] pgsql 10: hash indexes testing

От

Amit Kapila

Дата:

11 июля 2017 г., 18:13:00

On Tue, Jul 11, 2017 at 8:10 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Amit Kapila wrote:
>> On Tue, Jul 11, 2017 at 6:51 AM, AP <ap@zip.com.au> wrote:
>> > On Fri, Jul 07, 2017 at 05:58:25PM +0530, Amit Kapila wrote:
>
>> >> I can understand your concerns.  To address first concern we need to
>> >> work on one or more of following work items: (a) work on vacuums that
>> >> can be triggered on insert only workload (it should perform index
>> >> vacuum as well) (b) separate utility statement/function to squeeze
>> >> hash index (c) db internally does squeezing like after each split, so
>> >> that chances of such a problem can be reduced, but that will be at the
>> >> cost of performance reduction in other workloads, so not sure if it is
>> >> advisable.  Among these (b) is simplest to do but may not be
>> >> convenient for the user.
>> >
>> > (a) seems like a good compromise on (c) if it can be done without disruption
>> >     and in time.
>> > (b) seems analogous to the path autovcauum took. Unless I misremember, before
>> >     autovacuum we had a cronjob to do similar work. It's probably a sane path
>> >     to take as a first step on the way to (a)
>> > (c) may not be worth the effort if it compromises general use, though perhaps
>> >     it could be used to indicate to (a) that now is a good time to handle
>> >     this bit?
>>
>> Nice summarization!  I think before doing anything of that sort we
>> need opinions from others as well.  If some other community members
>> also see value in doing one or multiple of above things, then I can
>> write a patch.
>
> I haven't read the thread, but in late PG10 autovacuum gained the idea
> of "work items" that can be plugged from other parts of the server;
> currently BRIN uses it to cause a range to be summarized right after the
> next one starts being filled.  This is activated separately for each
> index via a reloption.  Perhaps something like that can be used for hash
> indexes?  See commit 7526e10224f0792201e99631567bbe44492bbde4.
>

Yes, I also think the same idea can be used, in fact, I have mentioned
it [1] as soon as you have committed that patch.  Do we want to do
anything at this stage for PG-10?  I don't think we should attempt
something this late unless people feel this is a show-stopper issue
for usage of hash indexes.  If required, I think a separate function
can be provided to allow users to perform squeeze operation.

[1] - https://www.postgresql.org/message-id/CAA4eK1%2BnVxAGmzj7RPUHscbTdAG1zQub6L9UqFJSu%3DgdwtY%2BpQ%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] pgsql 10: hash indexes testing

От

Alvaro Herrera

Дата:

11 июля 2017 г., 23:38:07

Amit Kapila wrote:

> Yes, I also think the same idea can be used, in fact, I have mentioned
> it [1] as soon as you have committed that patch.  Do we want to do
> anything at this stage for PG-10?  I don't think we should attempt
> something this late unless people feel this is a show-stopper issue
> for usage of hash indexes.  If required, I think a separate function
> can be provided to allow users to perform squeeze operation.

Sorry, I have no idea how critical this squeeze thing is for the
newfangled hash indexes, so I cannot comment on that.  Does this make
the indexes unusable in some way under some circumstances?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] pgsql 10: hash indexes testing

От

Amit Kapila

Дата:

12 июля 2017 г., 11:10:44

On Tue, Jul 11, 2017 at 11:08 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Amit Kapila wrote:
>
>> Yes, I also think the same idea can be used, in fact, I have mentioned
>> it [1] as soon as you have committed that patch.  Do we want to do
>> anything at this stage for PG-10?  I don't think we should attempt
>> something this late unless people feel this is a show-stopper issue
>> for usage of hash indexes.  If required, I think a separate function
>> can be provided to allow users to perform squeeze operation.
>
> Sorry, I have no idea how critical this squeeze thing is for the
> newfangled hash indexes, so I cannot comment on that.  Does this make
> the indexes unusable in some way under some circumstances?
>

It seems so.  Basically, in the case of a large number of duplicates,
we hit the maximum number of overflow pages.  There is a theoretical
possibility of hitting it but it could also happen that we are not
free the existing unused overflow pages due to which it keeps on
growing and hit the limit.  I have requested up thread to verify if
that is happening in this case and I am still waiting for same.  The
squeeze operation does free such unused overflow pages after cleaning
them.  As this is a costly operation and needs a cleanup lock, so we
currently perform it only during Vacuum and next split from the bucket
which can have redundant overflow pages.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] pgsql 10: hash indexes testing

От

Robert Haas

Дата:

02 августа 2017 г., 21:34:13

On Wed, Jul 12, 2017 at 1:10 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>> Yes, I also think the same idea can be used, in fact, I have mentioned
>>> it [1] as soon as you have committed that patch.  Do we want to do
>>> anything at this stage for PG-10?  I don't think we should attempt
>>> something this late unless people feel this is a show-stopper issue
>>> for usage of hash indexes.  If required, I think a separate function
>>> can be provided to allow users to perform squeeze operation.
>>
>> Sorry, I have no idea how critical this squeeze thing is for the
>> newfangled hash indexes, so I cannot comment on that.  Does this make
>> the indexes unusable in some way under some circumstances?
>
> It seems so.  Basically, in the case of a large number of duplicates,
> we hit the maximum number of overflow pages.  There is a theoretical
> possibility of hitting it but it could also happen that we are not
> free the existing unused overflow pages due to which it keeps on
> growing and hit the limit.  I have requested up thread to verify if
> that is happening in this case and I am still waiting for same.  The
> squeeze operation does free such unused overflow pages after cleaning
> them.  As this is a costly operation and needs a cleanup lock, so we
> currently perform it only during Vacuum and next split from the bucket
> which can have redundant overflow pages.

Oops.  It was rather short-sighted of us not to increase
HASH_MAX_BITMAPS when we bumped HASH_VERSION.  Actually removing that
limit is hard, but we could have easily bumped it for 128 to say 1024
without (I think) causing any problem, which would have given us quite
a bit of headroom here.  I suppose we could still try to jam that
change in before beta3 (bumping HASH_VERSION again) but that might be
asking for trouble.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] pgsql 10: hash indexes testing

От

Дата:

03 августа 2017 г., 05:48:29

On Wed, Aug 02, 2017 at 11:34:13AM -0400, Robert Haas wrote:
> On Wed, Jul 12, 2017 at 1:10 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > It seems so.  Basically, in the case of a large number of duplicates,
> > we hit the maximum number of overflow pages.  There is a theoretical
> > possibility of hitting it but it could also happen that we are not
> > free the existing unused overflow pages due to which it keeps on
> > growing and hit the limit.  I have requested up thread to verify if
> > that is happening in this case and I am still waiting for same.  The
> > squeeze operation does free such unused overflow pages after cleaning
> > them.  As this is a costly operation and needs a cleanup lock, so we
> > currently perform it only during Vacuum and next split from the bucket
> > which can have redundant overflow pages.
> 
> Oops.  It was rather short-sighted of us not to increase
> HASH_MAX_BITMAPS when we bumped HASH_VERSION.  Actually removing that
> limit is hard, but we could have easily bumped it for 128 to say 1024
> without (I think) causing any problem, which would have given us quite
> a bit of headroom here.  I suppose we could still try to jam that
> change in before beta3 (bumping HASH_VERSION again) but that might be
> asking for trouble.

I, for one, would be grateful for such a bump (or better). Currently
having more fun than I wish trying to figure out where my inserts will
begin to fail so that I don't make a mess of things. Right now I can't
match data going in with sane partitioning points in-storage. If I can
go "3 months worth of data then partition" or the like and have enough
room for differences in the data then I can be pretty happy. ATM I'm
not getting anywhere near that and am tempted to chuck it all in, eat
the 3-4x disk space cost and go back to btree which'd cost me terrabytes.

AP

Re: [HACKERS] pgsql 10: hash indexes testing

От

Amit Kapila

Дата:

04 августа 2017 г., 08:51:01

On Wed, Aug 2, 2017 at 9:04 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Jul 12, 2017 at 1:10 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>>> Yes, I also think the same idea can be used, in fact, I have mentioned
>>>> it [1] as soon as you have committed that patch.  Do we want to do
>>>> anything at this stage for PG-10?  I don't think we should attempt
>>>> something this late unless people feel this is a show-stopper issue
>>>> for usage of hash indexes.  If required, I think a separate function
>>>> can be provided to allow users to perform squeeze operation.
>>>
>>> Sorry, I have no idea how critical this squeeze thing is for the
>>> newfangled hash indexes, so I cannot comment on that.  Does this make
>>> the indexes unusable in some way under some circumstances?
>>
>> It seems so.  Basically, in the case of a large number of duplicates,
>> we hit the maximum number of overflow pages.  There is a theoretical
>> possibility of hitting it but it could also happen that we are not
>> free the existing unused overflow pages due to which it keeps on
>> growing and hit the limit.  I have requested up thread to verify if
>> that is happening in this case and I am still waiting for same.  The
>> squeeze operation does free such unused overflow pages after cleaning
>> them.  As this is a costly operation and needs a cleanup lock, so we
>> currently perform it only during Vacuum and next split from the bucket
>> which can have redundant overflow pages.
>
> Oops.  It was rather short-sighted of us not to increase
> HASH_MAX_BITMAPS when we bumped HASH_VERSION.  Actually removing that
> limit is hard, but we could have easily bumped it for 128 to say 1024
> without (I think) causing any problem, which would have given us quite
> a bit of headroom here.

Yes, that sounds sensible, but I think it will just delay the problem
to happen.  I think here the actual problem is that we are not able to
perform squeeze operation often enough that it frees the overflow
pages.  Currently, we try to perform the squeeze only at the start of
next split of the bucket or during vacuum.  The reason for doing it
that way was that squeeze operation needs cleanup lock and we already
have that during the start of split and vacuum. Now, to solve it I
have already speculated few ways above [1] and among those, it is
feasible to either do this at end of split which can have performance
implications in some work loads, but will work fine for the case
reported in this thread and another is to some way (like we do for
Brin index in commit 7526e10224f0792201e99631567bbe44492bbde4) trigger
vacuum.

I think we can fix it in one of above ways and increase the value of
HASH_MAX_BITMAPS as well.

What do you say?

>  I suppose we could still try to jam that
> change in before beta3 (bumping HASH_VERSION again) but that might be
> asking for trouble.
>

I am not sure if we have any other option if we decide to increase the
value of HASH_MAX_BITMAPS. BTW, do you think some users will rely on
hash index built on some of the beta version?

Note - AP has off list shared the data dump and we (Ashutosh Sharma
and me) are able to reproduce the problem and we could see that if we
force vacuum via the debugger, then it is able to free overflow pages.
The exact numbers are not available at this stage as the test is not
complete.

[1] - https://www.postgresql.org/message-id/CAA4eK1KKq80BYOc%2BmcmHcQzV0Mcs3AHGjEEf--TnLaJbkeTgmg%40mail.gmail.com
-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] pgsql 10: hash indexes testing

От

Дата:

04 августа 2017 г., 09:49:54

On Fri, Aug 04, 2017 at 08:21:01AM +0530, Amit Kapila wrote:
> Note - AP has off list shared the data dump and we (Ashutosh Sharma
> and me) are able to reproduce the problem and we could see that if we
> force vacuum via the debugger, then it is able to free overflow pages.
> The exact numbers are not available at this stage as the test is not
> complete.

I've another if you would like it. I COPYed with FILLFACTOR of 10 and
it eventually failed but I could not recreate the index (via CREATE INDEX
CONCURRENTLY) with the data that made it using a fillfactor of 100. If
I created the index again (again with the same data) with fillfactor 10
then it completed.

I may be completely misunderstanding fillfactor but I always thought it was
a performance optimisation rather than something that may allow you to store
more (or less) index entries.

The stats for the various indexes are:

After COPYs started failing:
fmmdstash=# select overflow_pages/bitmap_pages/8,* from  pgstathashindex('link_datum_id_idx');?column? | version |
bucket_pages| overflow_pages | bitmap_pages | unused_pages | live_items | dead_items |   free_percent

----------+---------+--------------+----------------+--------------+--------------+------------+------------+------------------
  4095 |       3 |    103782169 |        4194176 |          128 |     13658343 | 5 085 570 007 |          0 |
21014.6558371539
(1 row)

Time: 6146310.494 ms (01:42:26.310)

After the CREATE INDEX CONCURRENTLY with FILLFACTOR 100 failed:
fmmdstash=# select overflow_pages/bitmap_pages/8,* from  pgstathashindex('link_datum_id_idx1');?column? | version |
bucket_pages| overflow_pages | bitmap_pages | unused_pages | live_items | dead_items |  free_percent

----------+---------+--------------+----------------+--------------+--------------+------------+------------+-----------------
  4095 |       3 |      6205234 |        4194176 |          128 |        86222 | 3080760746 |          0 |
615.91682922039
(1 row)

Time: 19128.527 ms (00:19.129)

After the CREATE INDEX CONCURRENTLY with FILLFACTOR 10 succeeded:
fmmdstash=# select overflow_pages/bitmap_pages/8,* from  pgstathashindex('link_datum_id_idx2');?column? | version |
bucket_pages| overflow_pages | bitmap_pages | unused_pages | live_items | dead_items |   free_percent

----------+---------+--------------+----------------+--------------+--------------+------------+------------+------------------
  3062 |       3 |     79677471 |        2572565 |          105 |      5074888 | 3187098806 |          0 |
19027.2399324415
(1 row)

Time: 1557509.940 ms (25:57.510)

The DB in question is now gone but I took a copy of the column as per
before so if you'd like it I can make it available via the same means.

AP

Re: [HACKERS] pgsql 10: hash indexes testing

От

Amit Kapila

Дата:

04 августа 2017 г., 16:22:09

On Fri, Aug 4, 2017 at 8:21 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Wed, Aug 2, 2017 at 9:04 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Wed, Jul 12, 2017 at 1:10 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>>>> Yes, I also think the same idea can be used, in fact, I have mentioned
>>>>> it [1] as soon as you have committed that patch.  Do we want to do
>>>>> anything at this stage for PG-10?  I don't think we should attempt
>>>>> something this late unless people feel this is a show-stopper issue
>>>>> for usage of hash indexes.  If required, I think a separate function
>>>>> can be provided to allow users to perform squeeze operation.
>>>>
>>>> Sorry, I have no idea how critical this squeeze thing is for the
>>>> newfangled hash indexes, so I cannot comment on that.  Does this make
>>>> the indexes unusable in some way under some circumstances?
>>>
>>> It seems so.  Basically, in the case of a large number of duplicates,
>>> we hit the maximum number of overflow pages.  There is a theoretical
>>> possibility of hitting it but it could also happen that we are not
>>> free the existing unused overflow pages due to which it keeps on
>>> growing and hit the limit.  I have requested up thread to verify if
>>> that is happening in this case and I am still waiting for same.  The
>>> squeeze operation does free such unused overflow pages after cleaning
>>> them.  As this is a costly operation and needs a cleanup lock, so we
>>> currently perform it only during Vacuum and next split from the bucket
>>> which can have redundant overflow pages.
>>
>> Oops.  It was rather short-sighted of us not to increase
>> HASH_MAX_BITMAPS when we bumped HASH_VERSION.  Actually removing that
>> limit is hard, but we could have easily bumped it for 128 to say 1024
>> without (I think) causing any problem, which would have given us quite
>> a bit of headroom here.
>
> Yes, that sounds sensible, but I think it will just delay the problem
> to happen.  I think here the actual problem is that we are not able to
> perform squeeze operation often enough that it frees the overflow
> pages.  Currently, we try to perform the squeeze only at the start of
> next split of the bucket or during vacuum.  The reason for doing it
> that way was that squeeze operation needs cleanup lock and we already
> have that during the start of split and vacuum. Now, to solve it I
> have already speculated few ways above [1] and among those, it is
> feasible to either do this at end of split which can have performance
> implications in some work loads, but will work fine for the case
> reported in this thread
>

I have implemented the patch with this approach as other approach
require quite extensive changes which I am not sure is the right thing
to do at this stage.

>
> I think we can fix it in one of above ways and increase the value of
> HASH_MAX_BITMAPS as well.
>

I have increased the number of hash bitmap pages as a separate patch.
I am not completely sure if it is a good idea to directly increase it
to 1024 as that will increase the size of hashmetapagedata from 960
bytes to 4544 bytes.  Shall we increase it to 512?



-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

Re: [HACKERS] pgsql 10: hash indexes testing

От

Amit Kapila

Дата:

04 августа 2017 г., 16:31:11

On Fri, Aug 4, 2017 at 9:19 AM, AP <ap@zip.com.au> wrote:
> On Fri, Aug 04, 2017 at 08:21:01AM +0530, Amit Kapila wrote:
>> Note - AP has off list shared the data dump and we (Ashutosh Sharma
>> and me) are able to reproduce the problem and we could see that if we
>> force vacuum via the debugger, then it is able to free overflow pages.
>> The exact numbers are not available at this stage as the test is not
>> complete.
>
> I've another if you would like it. I COPYed with FILLFACTOR of 10 and
> it eventually failed but I could not recreate the index (via CREATE INDEX
> CONCURRENTLY) with the data that made it using a fillfactor of 100. If
> I created the index again (again with the same data) with fillfactor 10
> then it completed.
>
> I may be completely misunderstanding fillfactor but I always thought it was
> a performance optimisation rather than something that may allow you to store
> more (or less) index entries.
>

It impacts the split behavior.  You can read about it at:
https://www.postgresql.org/docs/9.6/static/sql-createindex.html


>
> The DB in question is now gone but I took a copy of the column as per
> before so if you'd like it I can make it available via the same means.
>

Can you try with the patches posted above?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] pgsql 10: hash indexes testing

От

Robert Haas

Дата:

04 августа 2017 г., 23:29:28

On Fri, Aug 4, 2017 at 6:22 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> I have increased the number of hash bitmap pages as a separate patch.
> I am not completely sure if it is a good idea to directly increase it
> to 1024 as that will increase the size of hashmetapagedata from 960
> bytes to 4544 bytes.  Shall we increase it to 512?

I don't quite see what the problem is with increasing it to 4544
bytes.  What's your concern?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] pgsql 10: hash indexes testing

От

Robert Haas

Дата:

04 августа 2017 г., 23:45:03

On Fri, Aug 4, 2017 at 6:22 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> I have implemented the patch with this approach as other approach
> require quite extensive changes which I am not sure is the right thing
> to do at this stage.

I think this approach is actually better anyway. There's no guarantee
that VACUUM can be responsive enough to get the job done in time, work
items or no work items, and the split-cleanup is cheap enough that I
don't think we ought to worry about negative effects on foreground
workloads. Sure, it could have some impact, but *less bloat* could
also have some impact in the other direction.

+ hashbucketcleanup(rel, obucket, bucket_obuf,
+ BufferGetBlockNumber(bucket_obuf), NULL,
+ maxbucket, highmask, lowmask, NULL, NULL, true,
+ NULL, NULL);
+ LockBuffer(bucket_nbuf, BUFFER_LOCK_UNLOCK);

Don't we want to do those things in the other order?

- * which is passed in buffer nbuf, pinned and write-locked. That lock and
- * pin are released here. (The API is set up this way because we must do
- * _hash_getnewbuf() before releasing the metapage write lock. So instead of
- * passing the new bucket's start block number, we pass an actual buffer.)
+ * which is passed in buffer nbuf, pinned and write-locked. The lock will be
+ * released here and pin must be released by the caller. (The API is set up
+ * this way because we must do _hash_getnewbuf() before releasing the metapage
+ * write lock. So instead of passing the new bucket's start block number, we
+ * pass an actual buffer.)

Isn't the parenthesized part at the end of the comment wrong? I
realize this patch isn't editing that part anyway except for
reflowing, but further up in that same block comment it says this:
* The caller must hold a pin, but no lock, on the metapage buffer.* The buffer is returned in the same state. (The
metapageis only* touched if it becomes necessary to add or remove overflow pages.)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] pgsql 10: hash indexes testing

От

Alvaro Herrera

Дата:

04 августа 2017 г., 23:58:10

Robert Haas wrote:

> I think this approach is actually better anyway.  There's no guarantee
> that VACUUM can be responsive enough to get the job done in time, work
> items or no work items,

Yeah, autovacuum work items don't have a guaranteed response time.
They're okay for things that "ought to be done eventually", but if the
condition causes a high-speed load to fail with errors, then halting the
load until the cleanup is done seems like the way to go.  Having to
randomly inject pauses in your workload so that autovacuum has time to
cope isn't great from the user's POV (a condition that is made worse if
you don't have any mechanism to detect that you need a pause, or how
long to pause for.)

Once that part is working well, you could as a convenience (to avoid or
reduce the stalls some of the time) add autovacuum work-item support.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] pgsql 10: hash indexes testing

От

Amit Kapila

Дата:

05 августа 2017 г., 00:15:40

On Fri, Aug 4, 2017 at 11:15 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Aug 4, 2017 at 6:22 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> I have implemented the patch with this approach as other approach
>> require quite extensive changes which I am not sure is the right thing
>> to do at this stage.
>
> I think this approach is actually better anyway.  There's no guarantee
> that VACUUM can be responsive enough to get the job done in time, work
> items or no work items, and the split-cleanup is cheap enough that I
> don't think we ought to worry about negative effects on foreground
> workloads.  Sure, it could have some impact, but *less bloat* could
> also have some impact in the other direction.
>
> +        hashbucketcleanup(rel, obucket, bucket_obuf,
> +                          BufferGetBlockNumber(bucket_obuf), NULL,
> +                          maxbucket, highmask, lowmask, NULL, NULL, true,
> +                          NULL, NULL);
> +        LockBuffer(bucket_nbuf, BUFFER_LOCK_UNLOCK);
>
> Don't we want to do those things in the other order?
>

Earlier one of the callers was releasing the old bucket lock first, so
I just maintained that order, but I think it is better if we release
new bucket lock first. I think that will maintain the ordering and it
is better to do it before actual cleanup of the old bucket as cleanup
can take some time and there is no use of retaining a lock on the new
bucket for that time.  I will do some testing after making the change
and will post a patch in some time.

> - * which is passed in buffer nbuf, pinned and write-locked.  That lock and
> - * pin are released here.  (The API is set up this way because we must do
> - * _hash_getnewbuf() before releasing the metapage write lock.  So instead of
> - * passing the new bucket's start block number, we pass an actual buffer.)
> + * which is passed in buffer nbuf, pinned and write-locked.  The lock will be
> + * released here and pin must be released by the caller.  (The API is set up
> + * this way because we must do _hash_getnewbuf() before releasing the metapage
> + * write lock.  So instead of passing the new bucket's start block number, we
> + * pass an actual buffer.)
>
> Isn't the parenthesized part at the end of the comment wrong?  I
> realize this patch isn't editing that part anyway except for
> reflowing, but further up in that same block comment it says this:
>
>  * The caller must hold a pin, but no lock, on the metapage buffer.
>  * The buffer is returned in the same state.  (The metapage is only
>  * touched if it becomes necessary to add or remove overflow pages.)
>

The comment doesn't seem to be wrong but is not making much sense
here.  Both the actions _hash_getnewbuf and release of metapage lock
is done in the caller.  In 9.6, we were passing old bucket's block
number and new bucket's buffer, so the comment tries to explain why it
is passing buffer for the new bucket.  It makes less sense now because
we are passing buffer for both old and new buckets, so we can remove
it.  Do you want me to remove it as part of this patch or as a
separate patch?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] pgsql 10: hash indexes testing

От

Amit Kapila

Дата:

05 августа 2017 г., 00:45:48

On Fri, Aug 4, 2017 at 11:45 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Fri, Aug 4, 2017 at 11:15 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Fri, Aug 4, 2017 at 6:22 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>> I have implemented the patch with this approach as other approach
>>> require quite extensive changes which I am not sure is the right thing
>>> to do at this stage.
>>
>> I think this approach is actually better anyway.  There's no guarantee
>> that VACUUM can be responsive enough to get the job done in time, work
>> items or no work items, and the split-cleanup is cheap enough that I
>> don't think we ought to worry about negative effects on foreground
>> workloads.  Sure, it could have some impact, but *less bloat* could
>> also have some impact in the other direction.
>>
>> +        hashbucketcleanup(rel, obucket, bucket_obuf,
>> +                          BufferGetBlockNumber(bucket_obuf), NULL,
>> +                          maxbucket, highmask, lowmask, NULL, NULL, true,
>> +                          NULL, NULL);
>> +        LockBuffer(bucket_nbuf, BUFFER_LOCK_UNLOCK);
>>
>> Don't we want to do those things in the other order?
>>
>
> Earlier one of the callers was releasing the old bucket lock first, so
> I just maintained that order, but I think it is better if we release
> new bucket lock first. I think that will maintain the ordering and it
> is better to do it before actual cleanup of the old bucket as cleanup
> can take some time and there is no use of retaining a lock on the new
> bucket for that time.  I will do some testing after making the change
> and will post a patch in some time.
>

Changed as per suggestion.

>> - * which is passed in buffer nbuf, pinned and write-locked.  That lock and
>> - * pin are released here.  (The API is set up this way because we must do
>> - * _hash_getnewbuf() before releasing the metapage write lock.  So instead of
>> - * passing the new bucket's start block number, we pass an actual buffer.)
>> + * which is passed in buffer nbuf, pinned and write-locked.  The lock will be
>> + * released here and pin must be released by the caller.  (The API is set up
>> + * this way because we must do _hash_getnewbuf() before releasing the metapage
>> + * write lock.  So instead of passing the new bucket's start block number, we
>> + * pass an actual buffer.)
>>
>> Isn't the parenthesized part at the end of the comment wrong?  I
>> realize this patch isn't editing that part anyway except for
>> reflowing, but further up in that same block comment it says this:
>>
>>  * The caller must hold a pin, but no lock, on the metapage buffer.
>>  * The buffer is returned in the same state.  (The metapage is only
>>  * touched if it becomes necessary to add or remove overflow pages.)
>>
>
> The comment doesn't seem to be wrong but is not making much sense
> here.  Both the actions _hash_getnewbuf and release of metapage lock
> is done in the caller.  In 9.6, we were passing old bucket's block
> number and new bucket's buffer, so the comment tries to explain why it
> is passing buffer for the new bucket.  It makes less sense now because
> we are passing buffer for both old and new buckets, so we can remove
> it.  Do you want me to remove it as part of this patch or as a
> separate patch?
>

I have not done anything for this comment as it doesn't sound wrong to
me.  I think it is not making much sense in the current code and we
can remove it or change it as part of the separate patch if you or
others think so.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

fix_cleanup_bucket_after_split_v2.patch

Re: [HACKERS] pgsql 10: hash indexes testing

От

Amit Kapila

Дата:

05 августа 2017 г., 00:49:29

On Fri, Aug 4, 2017 at 10:59 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Aug 4, 2017 at 6:22 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> I have increased the number of hash bitmap pages as a separate patch.
>> I am not completely sure if it is a good idea to directly increase it
>> to 1024 as that will increase the size of hashmetapagedata from 960
>> bytes to 4544 bytes.  Shall we increase it to 512?
>
> I don't quite see what the problem is with increasing it to 4544
> bytes.  What's your concern?
>

Nothing as such.  It is just that the previous code might have some
reason to keep it at 128, probably if there are that many overflow
bucket pages, then it is better to reindex the index otherwise the
performance might suck while traversing the long overflow chains.  In
general, I see your point that if we can provide user to have that big
overflow space, then let's do it.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] pgsql 10: hash indexes testing

От

Robert Haas

Дата:

05 августа 2017 г., 01:14:21

On Fri, Aug 4, 2017 at 2:49 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Fri, Aug 4, 2017 at 10:59 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Fri, Aug 4, 2017 at 6:22 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>> I have increased the number of hash bitmap pages as a separate patch.
>>> I am not completely sure if it is a good idea to directly increase it
>>> to 1024 as that will increase the size of hashmetapagedata from 960
>>> bytes to 4544 bytes.  Shall we increase it to 512?
>>
>> I don't quite see what the problem is with increasing it to 4544
>> bytes.  What's your concern?
>
> Nothing as such.  It is just that the previous code might have some
> reason to keep it at 128, probably if there are that many overflow
> bucket pages, then it is better to reindex the index otherwise the
> performance might suck while traversing the long overflow chains.  In
> general, I see your point that if we can provide user to have that big
> overflow space, then let's do it.

Yeah.  The only concern I see is that one doesn't want to run out of
space in the metapage when future patches come along and try to do new
things.  But 4544 bytes still leaves quite a bit of headroom, and I
think it's better to solve a problem we know we have right now than
try to save too much space for future needs that may or may not occur.
Maybe at some point this whole thing needs a broader rethink, but the
fact that somebody hit the current limit before we got out of beta
makes me think that we should be fairly aggressive in trying to
ameliorate the problem.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] pgsql 10: hash indexes testing

От

Robert Haas

Дата:

05 августа 2017 г., 08:20:06

On Fri, Aug 4, 2017 at 2:45 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> I have not done anything for this comment as it doesn't sound wrong to
> me.  I think it is not making much sense in the current code and we
> can remove it or change it as part of the separate patch if you or
> others think so.

I don't get it.  The comment claims we have to _hash_getnewbuf before
releasing the metapage write lock.  But the function neither calls
_hash_getnewbuf nor releases the metapage write lock.  It then goes on
to say that for this reason we pass the new buffer rather than just
the block number.  However, even if it were true that we have to call
_hash_getnewbuf before releasing the metapage write lock, what does
that have to do with the choice of passing a buffer vs. a block
number?

Explain to me what I'm missing, please, because I'm completely befuddled!

(On another note, I committed these patches.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] pgsql 10: hash indexes testing

От

Amit Kapila

Дата:

05 августа 2017 г., 17:11:24

On Sat, Aug 5, 2017 at 7:50 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Aug 4, 2017 at 2:45 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> I have not done anything for this comment as it doesn't sound wrong to
>> me.  I think it is not making much sense in the current code and we
>> can remove it or change it as part of the separate patch if you or
>> others think so.
>
> I don't get it.  The comment claims we have to _hash_getnewbuf before
> releasing the metapage write lock.  But the function neither calls
> _hash_getnewbuf nor releases the metapage write lock.

Both the actions _hash_getnewbuf and release of the metapage lock are
done in the caller (_hash_expandtable).

>  It then goes on
> to say that for this reason we pass the new buffer rather than just
> the block number.  However, even if it were true that we have to call
> _hash_getnewbuf before releasing the metapage write lock, what does
> that have to do with the choice of passing a buffer vs. a block
> number?
>

For this, you have to look at PG9.6 code.  In PG9.6, we were passing
old bucket's block number and new bucket's buffer and the reason is
that in the caller (_hash_expandtable) we only have access to a buffer
of new bucket block.

> Explain to me what I'm missing, please, because I'm completely befuddled!
>

I think you need to compare the code of PG10 with PG9.6 for functions
_hash_splitbucket and _hash_expandtable.  I don't find this comment
much useful starting PG10.  Patch attached to remove it.


> (On another note, I committed these patches.)
>

Thanks.



-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

remove_unnecessary_comment_hash_v1.patch

Re: [HACKERS] pgsql 10: hash indexes testing

От

Дата:

06 августа 2017 г., 12:32:29

On Sat, Aug 05, 2017 at 04:41:24PM +0530, Amit Kapila wrote:
> > (On another note, I committed these patches.)
> 
> Thanks.

Seconded. :)

Now uploading data with fillfactor of 90. I'll know in 2-3 days
if the new patches are successful (earlier if they did not help).

I compiled (as apt.postgresql.org does not provide the latest
beta3 version for stretch; only sid which has a different perl
version) 10~beta3~20170805.2225-1~593.git0d1f98b.

AP

Re: [HACKERS] pgsql 10: hash indexes testing

От

Дата:

10 августа 2017 г., 16:41:37

On Sun, Aug 06, 2017 at 04:32:29PM +1000, AP wrote:
> On Sat, Aug 05, 2017 at 04:41:24PM +0530, Amit Kapila wrote:
> > > (On another note, I committed these patches.)
> > 
> > Thanks.
> 
> Seconded. :)
> 
> Now uploading data with fillfactor of 90. I'll know in 2-3 days
> if the new patches are successful (earlier if they did not help).
> 
> I compiled (as apt.postgresql.org does not provide the latest
> beta3 version for stretch; only sid which has a different perl
> version) 10~beta3~20170805.2225-1~593.git0d1f98b.

Have gotten success with a dataset that failed before with ff 10.

End result:

mdstash=# select * from pgstathashindex('link_datum_id_idx');version | bucket_pages | overflow_pages | bitmap_pages |
unused_pages| live_items | dead_items |   free_percent
 
---------+--------------+----------------+--------------+--------------+------------+------------+------------------
 4 |     12391325 |        5148912 |          158 |       191643 | 4560007478 |          0 | 1894.29056075982
 
(1 row)

mdstash=# select 5148912.0/158/8;      ?column?
-----------------------4073.5063291139240000
(1 row)

The index is 135GB rather than 900GB (from memory/give or take).

I'm happy so far. I'll be seeing how much more I can dump into it this weekend. :)

Thanks muchly to the both of you. :)

AP

Re: [HACKERS] pgsql 10: hash indexes testing

От

Robert Haas

Дата:

10 августа 2017 г., 23:12:25

On Thu, Aug 10, 2017 at 6:41 AM, AP <ap@zip.com.au> wrote:
> The index is 135GB rather than 900GB (from memory/give or take).

Whoa.  Big improvement.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] pgsql 10: hash indexes testing

От

Дата:

11 августа 2017 г., 05:31:52

On Thu, Aug 10, 2017 at 01:12:25PM -0400, Robert Haas wrote:
> On Thu, Aug 10, 2017 at 6:41 AM, AP <ap@zip.com.au> wrote:
> > The index is 135GB rather than 900GB (from memory/give or take).
> 
> Whoa.  Big improvement.

Not a good direct comparison in general but it fits my workload.

The 900GB was fillfactor 10 and the 135 was ff 90 BUT ff 90 on
v3 fails early into the import. Even ff 10 on v3 didn't succeed (came just
short). So for my usage I was facing having indexes with fillfactor 10
just to be able to put a more reasonable amount of data in them. Almost.

Now I don't have to as v4 copes with the load and more and in less disk
space so for me, the above is just lovely. :) This is even more so
given that the hash index v4 upload actually finished unlike the v3 one. :)

As I said in my last email, this weekend I'll be adding more to that table
so I'll see how far that takes me but the last two patches have given
me a great deal of confidence that the end result will be good news. :)

As an aside, btree for the above is around 2.5x bigger than hash v4 so 
chances are much better that a hash index will fit into ram which has
its own benefits. :)

AP

Re: [HACKERS] pgsql 10: hash indexes testing

От

Amit Kapila

Дата:

11 августа 2017 г., 08:03:51

On Thu, Aug 10, 2017 at 4:11 PM, AP <ap@zip.com.au> wrote:
> On Sun, Aug 06, 2017 at 04:32:29PM +1000, AP wrote:
>> On Sat, Aug 05, 2017 at 04:41:24PM +0530, Amit Kapila wrote:
>> > > (On another note, I committed these patches.)
>> >
>> > Thanks.
>>
>> Seconded. :)
>>
>> Now uploading data with fillfactor of 90. I'll know in 2-3 days
>> if the new patches are successful (earlier if they did not help).
>>
>> I compiled (as apt.postgresql.org does not provide the latest
>> beta3 version for stretch; only sid which has a different perl
>> version) 10~beta3~20170805.2225-1~593.git0d1f98b.
>
> Have gotten success with a dataset that failed before with ff 10.
>
> End result:
>
> mdstash=# select * from pgstathashindex('link_datum_id_idx');
>  version | bucket_pages | overflow_pages | bitmap_pages | unused_pages | live_items | dead_items |   free_percent
> ---------+--------------+----------------+--------------+--------------+------------+------------+------------------
>        4 |     12391325 |        5148912 |          158 |       191643 | 4560007478 |          0 | 1894.29056075982
> (1 row)
>

The free_percent calculation seems to be wrong.  Can you please once
try after recent commit 0b7ba3d6474b8f58e74dba548886df3250805cdf?  I
feel this should be fixed by that commit.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] pgsql 10: hash indexes testing

От

Amit Kapila

Дата:

11 августа 2017 г., 08:05:48

On Fri, Aug 11, 2017 at 5:01 AM, AP <ap@zip.com.au> wrote:
> On Thu, Aug 10, 2017 at 01:12:25PM -0400, Robert Haas wrote:
>> On Thu, Aug 10, 2017 at 6:41 AM, AP <ap@zip.com.au> wrote:
>> > The index is 135GB rather than 900GB (from memory/give or take).
>>
>> Whoa.  Big improvement.
>
>
> As an aside, btree for the above is around 2.5x bigger than hash v4 so
> chances are much better that a hash index will fit into ram which has
> its own benefits. :)
>

Yeah, that's exactly one of the benefit hash indexes can provide over
btree indexes.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] pgsql 10: hash indexes testing

От

Дата:

14 августа 2017 г., 06:40:36

On Fri, Aug 11, 2017 at 07:33:51AM +0530, Amit Kapila wrote:
> On Thu, Aug 10, 2017 at 4:11 PM, AP <ap@zip.com.au> wrote:
> > mdstash=# select * from pgstathashindex('link_datum_id_idx');
> >  version | bucket_pages | overflow_pages | bitmap_pages | unused_pages | live_items | dead_items |   free_percent
> >
---------+--------------+----------------+--------------+--------------+------------+------------+------------------
> >        4 |     12391325 |        5148912 |          158 |       191643 | 4560007478 |          0 |
1894.29056075982
> > (1 row)
> 
> The free_percent calculation seems to be wrong.  Can you please once
> try after recent commit 0b7ba3d6474b8f58e74dba548886df3250805cdf?  I
> feel this should be fixed by that commit.

Sorry I couldn't get to help you debugging this myself. Work got annoying. :/

That said, I think that this is the first time that I've seen the value be
under 100:

mdstash=# select * from pgstathashindex('link_datum_id_idx');version | bucket_pages | overflow_pages | bitmap_pages |
unused_pages| live_items | dead_items |   free_percent
 
---------+--------------+----------------+--------------+--------------+------------+------------+------------------
 4 |     22957200 |        9272698 |          283 |      2208624 | 8448300552 |          0 | 39.8146658879555
 
(1 row)

Time: 2882974.635 ms (48:02.975)

The index is still functioning, too, with far more data than I've ever had
in the table in the past and well beyond the point where it would previously
die. Happy days. :)

AP

Re: [HACKERS] pgsql 10: hash indexes testing

От

Amit Kapila

Дата:

14 августа 2017 г., 07:47:35

On Mon, Aug 14, 2017 at 6:10 AM, AP <ap@zip.com.au> wrote:
> On Fri, Aug 11, 2017 at 07:33:51AM +0530, Amit Kapila wrote:
>> On Thu, Aug 10, 2017 at 4:11 PM, AP <ap@zip.com.au> wrote:
>> > mdstash=# select * from pgstathashindex('link_datum_id_idx');
>> >  version | bucket_pages | overflow_pages | bitmap_pages | unused_pages | live_items | dead_items |   free_percent
>> >
---------+--------------+----------------+--------------+--------------+------------+------------+------------------
>> >        4 |     12391325 |        5148912 |          158 |       191643 | 4560007478 |          0 |
1894.29056075982
>> > (1 row)
>>
>> The free_percent calculation seems to be wrong.  Can you please once
>> try after recent commit 0b7ba3d6474b8f58e74dba548886df3250805cdf?  I
>> feel this should be fixed by that commit.
>
> Sorry I couldn't get to help you debugging this myself. Work got annoying. :/
>
> That said, I think that this is the first time that I've seen the value be
> under 100:
>

Thanks for confirming the same.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: [HACKERS] pgsql 10: hash indexes testing

Вложения

Вложения

Вложения