Обсуждение: Gist fastbuild and performances

Поиск
Список
Период
Сортировка

Gist fastbuild and performances

От
talk to ben
Дата:
Hi,

I am playing around with the gist fast build and comparing the result between v13 and v14.
The space gain and speed increase are really awesome.

When I compare the performance with the following script, I get a lot more data read into the buffers in v14 and a little slower query.

Is it expected ? (is the test dumb / too artificial ?)
(I found some discussion about the buffer usage but don't quite understand the outcome 

The script :

--pg14
\timing on
CREATE TABLE gist_fastbuild AS SELECT point(random(),random()) as pt FROM  generate_series(1,10000000,1);
CREATE INDEX ON gist_fastbuild USING gist (pt);
VACUUM ANALYZE gist_fastbuild;
\di+ gist_fastbuild_pt_idx
EXPLAIN (ANALYZE, BUFFERS) SELECT pt FROM gist_fastbuild WHERE pt <@ box(point(.5,.5), point(.75,.75));

COPY gist_fastbuild TO '/tmp/gist_fastbuild.copy';

--pg13
\timing on
CREATE TABLE gist_fastbuild(pt point);
COPY gist_fastbuild FROM '/tmp/gist_fastbuild.copy';
CREATE INDEX ON gist_fastbuild USING gist (pt);
VACUUM ANALYZE gist_fastbuild;
\di+ gist_fastbuild_pt_idx
EXPLAIN (ANALYZE, BUFFERS) SELECT pt FROM gist_fastbuild WHERE pt <@ box(point(.5,.5), point(.75,.75));

The explains :

V14# EXPLAIN (ANALYZE, BUFFERS) SELECT pt FROM gist_fastbuild WHERE pt <@ box(point(.5,.5), point(.75,.75));
                                                                       QUERY PLAN                                                                        
---------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Only Scan using gist_fastbuild_pt_idx on gist_fastbuild  (cost=0.42..419.42 rows=10000 width=16) (actual time=0.350..129.309 rows=626005 loops=1)
   Index Cond: (pt <@ '(0.75,0.75),(0.5,0.5)'::box)
   Heap Fetches: 0
   Buffers: shared hit=303083
 Planning:
   Buffers: shared hit=13
 Planning Time: 0.454 ms
 Execution Time: 148.611 ms
(8 rows)

V13# EXPLAIN (ANALYZE, BUFFERS) SELECT pt FROM gist_fastbuild WHERE pt <@ box(point(.5,.5), point(.75,.75));
                                                                       QUERY PLAN                                                                        
---------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Only Scan using gist_fastbuild_pt_idx on gist_fastbuild  (cost=0.42..539.42 rows=10000 width=16) (actual time=0.523..107.393 rows=626005 loops=1)
   Index Cond: (pt <@ '(0.75,0.75),(0.5,0.5)'::box)
   Heap Fetches: 0
   Buffers: shared hit=17334
 Planning:
   Buffers: shared hit=13
 Planning Time: 0.396 ms
 Execution Time: 126.713 ms
(8 rows)

Re: Gist fastbuild and performances

От
Francisco Olarte
Дата:
Ben:

On Fri, 8 Oct 2021 at 13:14, talk to ben <blo.talkto@gmail.com> wrote:
> I am playing around with the gist fast build and comparing the result between v13 and v14.
...
> When I compare the performance with the following script, I get a lot more data read into the buffers in v14 and a
littleslower query.
 
...
> Is it expected ? (is the test dumb / too artificial ?)

I'm not knowlegeable enough in gist or low level stuff, but I would
suggest you run the same scripts in both, just in case the copy/create
is influencing anything. I mean, once you have the scripts, do ..

> \timing on
> CREATE TABLE gist_fastbuild AS SELECT point(random(),random()) as pt FROM  generate_series(1,10000000,1);
> CREATE INDEX ON gist_fastbuild USING gist (pt);
> VACUUM ANALYZE gist_fastbuild;
> \di+ gist_fastbuild_pt_idx
..
> COPY gist_fastbuild TO '/tmp/gist_fastbuild.copy';

On a scratch database, and then run the same script:

> --pg13
> \timing on
> CREATE TABLE gist_fastbuild(pt point);
> COPY gist_fastbuild FROM '/tmp/gist_fastbuild.copy';
> CREATE INDEX ON gist_fastbuild USING gist (pt);
> VACUUM ANALYZE gist_fastbuild;
> \di+ gist_fastbuild_pt_idx
> EXPLAIN (ANALYZE, BUFFERS) SELECT pt FROM gist_fastbuild WHERE pt <@ box(point(.5,.5), point(.75,.75));

a couple times or more on each server to rule artifacts out.

( testing speed in general, which such fast times, I myself would
doubt comparisons of anything but exact runs )

It may give the same results, but then you know it does.


Francisco Olarte.



Re: Gist fastbuild and performances

От
talk to ben
Дата:
Hi, thanks for the answer.

I ran the same scipt on both sides at first.
Then I switched to a COPY because the CREATE TABLE generates random data.
Since I got weird results, I wanted to be sure I had the same data on both versions.

I ran the tests several times (and even asked a collegue to do it on his laptop) with the same results.

I totally agree that the query time are small and difficult to compare (but they are consistent across runs).
I am just surprised that we have to access x17 pages for the same result on version 14. On bigger queries
it could count. I just wanted to know if it's a know tradeoff of this new feature.

Re: Gist fastbuild and performances

От
Francisco Olarte
Дата:
Ben:

On Fri, 8 Oct 2021 at 15:52, talk to ben <blo.talkto@gmail.com> wrote:

> I ran the same scipt on both sides at first.
> Then I switched to a COPY because the CREATE TABLE generates random data.

Logical, just pointing that you missed one side is generating, the
other is copying. Since I read a lot of things about selects updating
rows due to whatever bits affecting HOT and blah-blah-blah I suggested
doing the same script but with non-random data, hence the
generate+test-copy-13, test-copy-14 approach as an easy way to
eliminate potential discrepancies without having to look a lot of
dirty details.

> Since I got weird results, I wanted to be sure I had the same data on both versions.
> I ran the tests several times (and even asked a collegue to do it on his laptop) with the same results.

If you ran the test as written, you where comparing run times of
different sequence of operations. If you know the sequence of
operation does not matter all is fine and dandy. I do not, ( I suspect
is does not matter, but I do not know ) so I suggested to run the same
sequences ( as to me it seems to be easy & fast, given the timing you
gave and assuming no untimed operation was long ) and not worry about
differences.

> I totally agree that the query time are small and difficult to compare (but they are consistent across runs).
> I am just surprised that we have to access x17 pages for the same result on version 14. On bigger queries
> it could count. I just wanted to know if it's a know tradeoff of this new feature.

If you do not run the same sequences, you do not know. Note I do not
know what exact sequences you have tested, I write with only what I
have read as as input.

Francisco Olarte.



Re: Gist fastbuild and performances

От
talk to ben
Дата:
On Fri, Oct 8, 2021 at 4:33 PM Francisco Olarte <folarte@peoplecall.com> wrote:
If you do not run the same sequences, you do not know. Note I do not
know what exact sequences you have tested, I write with only what I
have read as as input.

I ran this on both versions on fresh instances / clusters with the data from a previous run :

\timing on
CREATE TABLE gist_fastbuild(pt point);
COPY gist_fastbuild FROM '/tmp/gist_fastbuild.copy';
CREATE INDEX ON gist_fastbuild USING gist (pt);
VACUUM ANALYZE gist_fastbuild;
\di+ gist_fastbuild_pt_idx
EXPLAIN (ANALYZE, BUFFERS) SELECT pt FROM gist_fastbuild WHERE pt <@ box(point(.5,.5), point(.75,.75));

I get the same results as before.