Обсуждение: Shards + hash = forever running queries

Поиск
Список
Период
Сортировка

Shards + hash = forever running queries

От
Daniele Varrazzo
Дата:
Hello,

We are using Postgres 9.1.4. We are struggling with a class of queries
that got impossible to run after sharding a large table. Everything
like:

    select small.something, big.anything
    from small join big on small.big_id = big.id;

and variation such as "select * from big where id in (select big_id from small)"

Since "big" was sharded, the query plan results in something like:

 Hash Join  (cost=10000000001.23..30038997974.72 rows=10 width=753)
   Hash Cond: (b.id = i.big_id)
   ->  Append  (cost=0.00..20038552251.23 rows=118859245 width=11)
         ->  Index Scan using big_201207_pkey on big_201207 b
(cost=0.00..2224100.46 rows=1609634 width=12)
         ->  Index Scan using big_201101_pkey on big_201101 b
(cost=0.00..404899.71 rows=5437497 width=12)
         ->  Index Scan using big_201104_pkey on big_201104 b
(cost=0.00..349657.58 rows=4625181 width=12)
         -> [...all the shards]
   ->  Hash  (cost=10000000001.10..10000000001.10 rows=10 width=742)
         ->  Seq Scan on small i  (cost=10000000000.00..10000000001.10
rows=10 width=742)

Postgres ends up in never-ending reads: even if "small" has only three
rows I've never seen such query finishing, the time passed being even
longer than a full scan on big.

The plan looks sub-optimal, as it seems it first does a huge indexscan
of all the partitions, then it joins the result against a small hash.

1. Can we fix the queries to work around this problem?

2. Could the planner be fixed for this scenario for PG 9.2 (or 9.3)?
Creating the hash beforehand, performing an hash join for each
partition and merging the results looks like it would bring it back
into the realm of the runnable queries. Am I wrong?

Thank you very much.

-- Daniele

Re: Shards + hash = forever running queries

От
Daniele Varrazzo
Дата:
On Mon, Jul 23, 2012 at 11:03 AM, Daniele Varrazzo
<daniele.varrazzo@gmail.com> wrote:

> 1. Can we fix the queries to work around this problem?

As a stop-gap measure I've defined a get_big(id) function and using it
to pull in the details we're interested into from the "big" table:

    create function get_big (id int) returns big as $$
    select * from big where id = $1;
    $$ language sql stable strict;

I'm not completely satisfied by it though: if there's any better
solution I'd be happy to know.

Thank you,

-- Daniele

Re: Shards + hash = forever running queries

От
Tom Lane
Дата:
Daniele Varrazzo <daniele.varrazzo@gmail.com> writes:
> Since "big" was sharded, the query plan results in something like:

>  Hash Join  (cost=10000000001.23..30038997974.72 rows=10 width=753)
>    Hash Cond: (b.id = i.big_id)
>    ->  Append  (cost=0.00..20038552251.23 rows=118859245 width=11)
>          ->  Index Scan using big_201207_pkey on big_201207 b
> (cost=0.00..2224100.46 rows=1609634 width=12)
>          ->  Index Scan using big_201101_pkey on big_201101 b
> (cost=0.00..404899.71 rows=5437497 width=12)
>          ->  Index Scan using big_201104_pkey on big_201104 b
> (cost=0.00..349657.58 rows=4625181 width=12)
>          -> [...all the shards]
>    ->  Hash  (cost=10000000001.10..10000000001.10 rows=10 width=742)
>          ->  Seq Scan on small i  (cost=10000000000.00..10000000001.10
> rows=10 width=742)

[ squint... ]  9.1 certainly ought to be able to find a smarter plan for
such a case.  For instance, if I try this on 9.1 branch tip:

regression=# create table p (id int primary key);
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "p_pkey" for table "p"
CREATE TABLE
regression=# create table c1 (primary key (id)) inherits(p);
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "c1_pkey" for table "c1"
CREATE TABLE
regression=# create table c2 (primary key (id)) inherits(p);
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "c2_pkey" for table "c2"
CREATE TABLE
regression=# explain select * from p,int4_tbl where id=f1;
                                   QUERY PLAN
--------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..53.25 rows=120 width=8)
   Join Filter: (public.p.id = int4_tbl.f1)
   ->  Seq Scan on int4_tbl  (cost=0.00..1.05 rows=5 width=4)
   ->  Append  (cost=0.00..10.40 rows=3 width=4)
         ->  Index Scan using p_pkey on p  (cost=0.00..1.87 rows=1 width=4)
               Index Cond: (id = int4_tbl.f1)
         ->  Index Scan using c1_pkey on c1 p  (cost=0.00..4.27 rows=1 width=4)
               Index Cond: (id = int4_tbl.f1)
         ->  Index Scan using c2_pkey on c2 p  (cost=0.00..4.27 rows=1 width=4)
               Index Cond: (id = int4_tbl.f1)
(10 rows)

You have evidently got enable_seqscan turned off, so I wonder whether
the cost penalties applied by that are swamping the estimates.  Do you
get any better results if you re-enable that?

            regards, tom lane

Re: Shards + hash = forever running queries

От
Daniele Varrazzo
Дата:
On Mon, Jul 23, 2012 at 4:07 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Daniele Varrazzo <daniele.varrazzo@gmail.com> writes:
>> Since "big" was sharded, the query plan results in something like:
>> [ugly]
>
> [ squint... ]  9.1 certainly ought to be able to find a smarter plan for
> such a case.  For instance, if I try this on 9.1 branch tip:
> [good]
>
> You have evidently got enable_seqscan turned off, so I wonder whether
> the cost penalties applied by that are swamping the estimates.  Do you
> get any better results if you re-enable that?

Hello Tom, thank you for your help.

Actually, I don't know what to say. seqscan were most likely enabled
when the problem showed up. I may have disabled it for testing in my
session and the plan I've copied may have been generated with disabled
seqscan, but the original problem (that query never completing) was
reported in different sessions by different people, so the only
possibility was that seqscans were disabled in the config file...
which I have been confirmed was not the case. I hadn't tested in my
session whether they were disabled before explicitly disabling them
for testing.

Matter of fact, after reading your reply, I've tested the query
again... and it was fast, the plan being the nested loop of your
example. :-\ What can I say, thank you for your radiation...

I've tried reverting other schema changes we performed yesterday but
I've not been able to reproduce the original slowness. In case we find
something that may be any useful to postgres I'll report it back.

Have a nice day,

-- Daniele