Re: [HACKERS] pgsql 10: hash indexes testing

Поиск

Список

Период

Сортировка

От	AP
Тема	Re: [HACKERS] pgsql 10: hash indexes testing
Дата	6 июля 2017 г. 10:02:21
Msg-id	20170706040221.coywudofctugy7tt@zip.com.au обсуждение исходный текст
Ответ на	Re: [HACKERS] pgsql 10: hash indexes testing (Amit Kapila <amit.kapila16@gmail.com>)
Список	pgsql-hackers

Дерево обсуждения

On Thu, Jul 06, 2017 at 08:52:03AM +0530, Amit Kapila wrote:
> On Thu, Jul 6, 2017 at 2:40 AM, AP <ap@zip.com.au> wrote:
> > On Wed, Jul 05, 2017 at 05:52:32PM +0530, Amit Kapila wrote:
> >> >> >  version | bucket_pages | overflow_pages | bitmap_pages | unused_pages | live_items | dead_items |
free_percent
> >> >> >
---------+--------------+----------------+--------------+--------------+------------+------------+------------------
> >> >> >        3 |     10485760 |        2131192 |           66 |            0 | 2975444240 |          0 |
1065.19942179026
> >> >> > (1 row)
> > ...
> >> >> > And I do appear to have an odd percentage of free space. :)
> >>
> >> Are you worried about "unused_pages"? If so, then this is not a major
> >
> > Nope. "free_percent" Just a bit weird that I have it at over 1000% free. :)
> > Shouldn't that number be < 100?
> 
> Yes, there seems to be some gotcha in free percent calculation.  Is it
> possible for you to debug or in some way share the test?

I can try to debug but I need to know what to look for and how. If it
requires data reloads then that's around 12-15 hours per hit.

As for sharing the test, that'd mean sharing the data. If it helps I can
provide the content of that column but you're looking at an sql dump that
is roughly (2*64+1)*2.3 billion (give or take a (few) billion) in size. :)

> > Well, if this is the cause of my little issue, it might be nice. ATM
> > my import script bombs out on errors (that I've duplicated! :) It took
> > 11 hours but it bombed) and it sounds like I'll need to do a manual
> > VACUUM before it can be run again.
> >
> 
> Yeah, I think after manual vacuum you should be able to proceed.

I don't think that'll help. I did a small check which I hope is helpful in
seeing if it will. Working off a similar db that completed (as it was
smaller and I did not want to mess with my one copy of the broken db)
I got the following results:

Pre-VACUUM:
-----------
# \di+                               List of relationsSchema |       Name        | Type  |   Owner   |  Table  |  Size
| Description

--------+-------------------+-------+-----------+---------+---------+-------------
...public | link_datum_id_idx | index | mdkingpin | link    | 90 GB   |
...

# select * from  pgstathashindex('link_datum_id_idx');version | bucket_pages | overflow_pages | bitmap_pages |
unused_pages| live_items | dead_items |  free_percent

---------+--------------+----------------+--------------+--------------+------------+------------+-----------------
3 |      7611595 |        3451261 |          106 |       777013 | 3076131325 |          0 | 1512.8635780908

# vacuum VERBOSE ANALYZE link;
INFO:  vacuuming "public.link"
INFO:  "link": found 0 removable, 2272156152 nonremovable row versions in 120507771 out of 123729466 pages
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 8594
There were 0 unused item pointers.
Skipped 0 pages due to buffer pins, 0 frozen pages.
0 pages are entirely empty.
CPU: user: 457.16 s, system: 755.84 s, elapsed: 4196.75 s.
INFO:  vacuuming "pg_toast.pg_toast_183727891"
INFO:  index "pg_toast_183727891_index" now contains 1441820 row versions in 3956 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU: user: 0.02 s, system: 0.00 s, elapsed: 0.56 s.
INFO:  "pg_toast_183727891": found 0 removable, 1441820 nonremovable row versions in 332271 out of 332271 pages
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 8594
There were 0 unused item pointers.
Skipped 0 pages due to buffer pins, 0 frozen pages.
0 pages are entirely empty.
CPU: user: 1.16 s, system: 2.26 s, elapsed: 22.80 s.
INFO:  analyzing "public.link"
INFO:  "link": scanned 3000000 of 123729466 pages, containing 56661337 live rows and 0 dead rows; 3000000 rows in
sample,2330296882 estimated total rows

VACUUM    
Time: 7057484.079 ms (01:57:37.484)

Post-VACUUM:
------------
# \di+Schema |       Name        | Type  |   Owner   |  Table  |  Size   | Description
--------+-------------------+-------+-----------+---------+---------+-------------public | link_datum_id_idx | index |
mdkingpin| link    | 90 GB   |

# select * from  pgstathashindex('link_datum_id_idx');version | bucket_pages | overflow_pages | bitmap_pages |
unused_pages| live_items | dead_items |  free_percent

---------+--------------+----------------+--------------+--------------+------------+------------+-----------------
3 |      7611595 |        3451261 |          106 |       777013 | 3076131325 |          0 | 1512.8635780908

The end results are the same.

Then I did:
# CREATE INDEX CONCURRENTLY ON link USING hash (datum_id) WITH (fillfactor = 90);
CREATE INDEX
Time: 12545612.560 ms (03:29:05.613)
# \di+                                                   List of relationsSchema |        Name        | Type  |   Owner
 |  Table  |  Size   | Description

--------+--------------------+-------+-----------+---------+---------+-------------public | link_datum_id_idx  | index
|mdkingpin | link    | 90 GB   |public | link_datum_id_idx1 | index | mdkingpin | link    | 71 GB   |

# select * from  pgstathashindex('link_datum_id_idx1');version | bucket_pages | overflow_pages | bitmap_pages |
unused_pages| live_items | dead_items |   free_percent

---------+--------------+----------------+--------------+--------------+------------+------------+------------------
 3 |      7340032 |        2014699 |           62 |            0 | 2336759432 |          0 | 909.757757369414

That's markedly different. At a rough estimate I should be able to double
the row count before I hit the "real" limits of a hash index.

When you refer to VACUUM do you mean VACUUM FULL? That's going to get nasty
if that's the case as the table is almost 1TB in size.

> >From above stats, it is clear that you have hit the maximum number of
> overflow pages we can support today.  Now, here one can argue that we
> should increase the limit of overflow pages in hash index which we can
> do, but I think you can again hit such a problem after some more time.

True, though I'm not far off hitting it "for real" at the present limits
so an increase would be of benefit in other respects (ie not needing to
have as many tables to manage because we have to bundle a set off due
to index limits).

> So at this stage, there are two possibilities for you (a) run manual
> Vacuum in-between (b) create the index after bulk load.  In general,
> whatever I have mentioned in (b) is a better way for bulk loading.
> Note here again the free_percent seems to be wrong.

If you didn't mean VACUUM FULL then (a) does not appear to work and (b)
would kill usability of the db during import, which would happen daily
(though with a vastly reduced data size). It also messes with the 
permission model that has been set up for the least-trusted section of
the project (at the moment that section can only INSERT).

Still, I may wind up going with (b) if a VACUUM FULL is the only other
real choice but would prefer to avoid it. The fact that the index is
around 300GB smaller (so far) than btree may well be worth the pain
all by its lonesome.

AP

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Amit Kapila
Дата: 06 июля 2017 г., 09:22:03
Сообщение: Re: [HACKERS] pgsql 10: hash indexes testing

Следующее

От: Amit Kapila
Дата: 06 июля 2017 г., 11:22:25
Сообщение: Re: [HACKERS] Proposal : For Auto-Prewarm.

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: [HACKERS] pgsql 10: hash indexes testing

Предыдущее

Следующее