Re: Slow query - lots of temporary files.

Поиск
Список
Период
Сортировка
От Johann Spies
Тема Re: Slow query - lots of temporary files.
Дата
Msg-id CAGZ55DR2T6PVA8Zf4R-G5VXz25Q4VLGXVATJfWL2y0hwGD40CQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Slow query - lots of temporary files.  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Список pgsql-performance


On 10 June 2015 at 16:50, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:


The problematic piece of the explain plan is this:

 ->  Merge Join  (cost=4384310.92..21202716.78 rows=6664163593
                  width=390)"
       Output: a.ut, c.gt, b.go, b.gn, d.au"
       Merge Cond: ((c.ut)::text = (d.rart_id)::text)"

That is, the planner expects ~6.7 billion rows, each ~390B wide. That's ~2.5TB of data that needs to be stored to disk (so that the sort can process it).

The way the schema is designed might be one of the issues - ISTM the 'ut' column is somehow universal, mixing values referencing different columns in multiple tables. Not only that's utterly misleading for the planner (and may easily cause issues with huge intermediate results), but it also makes formulating the queries very difficult. And of course, the casting between text and int is not very good either.

Fix the schema to follow relational best practices - separate the values into multiple columns, and most of this will go away.

Thanks for your reply Tomas.

I do not understand what the problem with the 'ut' column is.  It is a unique identifier in the first table(africa_uts) and is used in the other tables to establish joins and does have the same type definition in all the tables.  Is the problem in the similar name.  The data refers in all the 'ut' columns of the different tables to the same data.  I do not casting of integers into text in this case.  I don't know why the planner is doing it.  The field 'rart_id' in isi.rauthor is just another name for 'ut' in the other tables and have the same datatype.

I do not understand your remark: "separate the values into multiple columns". I cannot see which values can be separated into different columns in the schema.  Do you mean in the query?  Why?


Johann
--
Because experiencing your loyal love is better than life itself,
my lips will praise you.  (Psalm 63:3)

В списке pgsql-performance по дате отправления:

Предыдущее
От: Sasa Vilic
Дата:
Сообщение: Slow query: Postgres chooses nested loop over hash join, whery by hash join is much faster, wrong number of rows estimated
Следующее
От: "Sheena, Prabhjot"
Дата:
Сообщение: pg bouncer issue what does sv_used column means