Shards + hash = forever running queries

Поиск
Список
Период
Сортировка
От Daniele Varrazzo
Тема Shards + hash = forever running queries
Дата
Msg-id CA+mi_8ZWi8MUvpuiqse+wYMreBF_j0vLDK35=H-ALwfVm3fPmA@mail.gmail.com
обсуждение исходный текст
Ответы Re: Shards + hash = forever running queries
Re: Shards + hash = forever running queries
Список pgsql-performance
Hello,

We are using Postgres 9.1.4. We are struggling with a class of queries
that got impossible to run after sharding a large table. Everything
like:

    select small.something, big.anything
    from small join big on small.big_id = big.id;

and variation such as "select * from big where id in (select big_id from small)"

Since "big" was sharded, the query plan results in something like:

 Hash Join  (cost=10000000001.23..30038997974.72 rows=10 width=753)
   Hash Cond: (b.id = i.big_id)
   ->  Append  (cost=0.00..20038552251.23 rows=118859245 width=11)
         ->  Index Scan using big_201207_pkey on big_201207 b
(cost=0.00..2224100.46 rows=1609634 width=12)
         ->  Index Scan using big_201101_pkey on big_201101 b
(cost=0.00..404899.71 rows=5437497 width=12)
         ->  Index Scan using big_201104_pkey on big_201104 b
(cost=0.00..349657.58 rows=4625181 width=12)
         -> [...all the shards]
   ->  Hash  (cost=10000000001.10..10000000001.10 rows=10 width=742)
         ->  Seq Scan on small i  (cost=10000000000.00..10000000001.10
rows=10 width=742)

Postgres ends up in never-ending reads: even if "small" has only three
rows I've never seen such query finishing, the time passed being even
longer than a full scan on big.

The plan looks sub-optimal, as it seems it first does a huge indexscan
of all the partitions, then it joins the result against a small hash.

1. Can we fix the queries to work around this problem?

2. Could the planner be fixed for this scenario for PG 9.2 (or 9.3)?
Creating the hash beforehand, performing an hash join for each
partition and merging the results looks like it would bring it back
into the realm of the runnable queries. Am I wrong?

Thank you very much.

-- Daniele

В списке pgsql-performance по дате отправления:

Предыдущее
От: Jim Vanns
Дата:
Сообщение: Odd blocking (or massively latent) issue - even with EXPLAIN
Следующее
От: Daniele Varrazzo
Дата:
Сообщение: Re: Shards + hash = forever running queries