Re: selectivity function

Поиск
Список
Период
Сортировка
От Greg Hennessy
Тема Re: selectivity function
Дата
Msg-id e23e9771-463f-c189-8502-f2c3c748f8fb@gmail.com
обсуждение исходный текст
Ответ на Re: selectivity function  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On Thu, May 26, 2022 at 3:10 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Can you do anything useful with attaching selectivity estimates
> to the functions it references, instead?
I may have been doing down a bad path before. The function I'm
working to improve has five argument, the last being "degrees", which
is the match radius. Obviously a larger match radius  should cause more
matches.

For a small value of a match radius (0.005 degrees):

q3c_test=# explain (analyze, buffers) select * from test as a, test1 as 
b where q3c_join(a.ra,a.dec,b.ra,b.dec,.005);
QUERY PLAN
Nested Loop  (cost=92.28..22787968818.00 rows=5 width=32) (actual 
time=7.799..10758.566 rows=31 loops=1)
   Buffers: shared hit=8005684
   ->  Seq Scan on test a  (cost=0.00..15406.00 rows=1000000 width=16) 
(actual time=0.008..215.570 rows=1000000 loops=1)
         Buffers: shared hit=5406
   ->  Bitmap Heap Scan on test1 b  (cost=92.28..22785.45 rows=250 
width=16) (actual time=0.009..0.009 rows=0 loops=1000000)

(note: I deleted some of the output, since I think I'm keeping the 
important bits)

So, the cost of the query is calculated as 2e10, where it expect five rows,
found 31, and a hot cache of reading 8 million units of disk space, I'd have
to check the fine manual to remind myself of the units of that.

When I do the same sort of query on a much larger match radius (5 deg) I 
get:
q3c_test=# explain (analyze, buffers) select * from test as a, test1 as 
b where q3c_join(a.ra,a.dec,b.ra,b.dec,5);
QUERY PLAN
Nested Loop  (cost=92.28..22787968818.00 rows=4766288 width=32) (actual 
time=0.086..254995.691 rows=38051626 loops=1)
   Buffers: shared hit=104977026
   ->  Seq Scan on test a  (cost=0.00..15406.00 rows=1000000 width=16) 
(actual time=0.008..261.425 rows=1000000 loops=1)
         Buffers: shared hit=5406
   ->  Bitmap Heap Scan on test1 b  (cost=92.28..22785.45 rows=250 
width=16) (actual time=0.053..0.247 rows=38 loops=1000000)

The "total cost" is the same identical 2e10, this time the number of 
rows expectd
is 4.7 million, the number of rows delivered is 38 million (so the 
calculation is off
by a factor of 8, I'm not sure that is important), but the io is now 104 
million units.
So while we are doing a lot more IO, and dealing with a lot more rows, the
calculated cost is identical. That seems strange me me. Is that a normal 
thing?
Is it possible that the cost calculation isn't including the selectivity 
calculation?

Greg





В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: suboverflowed subtransactions concurrency performance optimize
Следующее
От: Masahiko Sawada
Дата:
Сообщение: Re: Support logical replication of DDLs