Обсуждение: [GENERAL] a JOIN to a VIEW seems slow

Поиск
Список
Период
Сортировка

[GENERAL] a JOIN to a VIEW seems slow

От
"Frank Millman"
Дата:
Hi all
 
This is a follow-up to a recent question I posted regarding a slow query. I thought that the slowness was caused by the number of JOINs in the query, but with your assistance I have found the true reason. I said in the previous thread that the question had become academic, but now that I understand things better, it is no longer academic as it casts doubt on my whole approach.
 
I have split my AR transaction table into three physical tables – ar_tran_inv, ar_tran_crn, ar_tran_rec. I will probably add others at some point, such as ar_tran_jnl.
 
I then create a VIEW to view all transactions combined. The view is created like this -
 
CREATE VIEW ar_trans AS
  SELECT ‘ar_inv’ AS tran_type, row_id AS tran_row_id, tran_number ... FROM ar_tran_inv WHERE posted = ‘1’
  UNION ALL
  SELECT ‘ar_crn’ AS tran_type, row_id AS tran_row_id, tran_number ... FROM ar_tran_crn WHERE posted = ‘1’
  UNION ALL
  SELECT ‘ar_rec’ AS tran_type, row_id AS tran_row_id, tran_number ... FROM ar_tran_rec WHERE posted = ‘1’
 
I have another table called ‘ar_trans_due’, to keep track of outstanding transactions. All of the three transaction types generate entries into this table. To identify the source of the transaction, I have created columns in ar_trans_due called ‘tran_type’ and ‘tran_row_id’. After inserting a row into ‘ar_tran_inv’, I invoke this -
 
  INSERT INTO ar_trans_due (tran_type, tran_row_id, ...) VALUES (‘ar_inv’, ar_tran_inv.row_id, ...), and similar for the other transaction types. It is handled by a Python program, and it all happens within a transaction.
 
When I view a row in ar_trans_due, I want to retrieve data from the source transaction, so I have this -
 
  SELECT * FROM ar_trans_due a
  LEFT JOIN ar_trans b ON b.tran_type = a.tran_type AND b.tran_row_id = a.tran_row_id
 
I understand that PostgreSQL must somehow follow a path from the view ‘ar_trans’ to the physical table ‘ar_tran_inv’, but I assumed it would execute the equivalent of SELECT * FROM ar_tran_inv WHERE row_id = a.tran_row_id AND posted = ‘1’.
 
If this was the case, it would be an indexed read, and very fast. Instead, according to EXPLAIN, it performs a sequential scan of the ‘ar_tran_inv’ table.
 
It also scans ‘ar_tran_crn’ and ‘ar_tran_rec’, but EXPLAIN shows that it uses a Bitmap Heap Scan on those. I assume that is because the tables are currently empty.
 
Is this analysis correct?
 
If so, is there any way to force it to use an indexed read?
 
Thanks for any pointers.
 
Frank Millman
 

Re: [GENERAL] a JOIN to a VIEW seems slow

От
Pavel Stehule
Дата:


2017-09-14 10:14 GMT+02:00 Frank Millman <frank@chagford.com>:
Hi all
 
This is a follow-up to a recent question I posted regarding a slow query. I thought that the slowness was caused by the number of JOINs in the query, but with your assistance I have found the true reason. I said in the previous thread that the question had become academic, but now that I understand things better, it is no longer academic as it casts doubt on my whole approach.
 
I have split my AR transaction table into three physical tables – ar_tran_inv, ar_tran_crn, ar_tran_rec. I will probably add others at some point, such as ar_tran_jnl.
 
I then create a VIEW to view all transactions combined. The view is created like this -
 
CREATE VIEW ar_trans AS
  SELECT ‘ar_inv’ AS tran_type, row_id AS tran_row_id, tran_number ... FROM ar_tran_inv WHERE posted = ‘1’
  UNION ALL
  SELECT ‘ar_crn’ AS tran_type, row_id AS tran_row_id, tran_number ... FROM ar_tran_crn WHERE posted = ‘1’
  UNION ALL
  SELECT ‘ar_rec’ AS tran_type, row_id AS tran_row_id, tran_number ... FROM ar_tran_rec WHERE posted = ‘1’
 
I have another table called ‘ar_trans_due’, to keep track of outstanding transactions. All of the three transaction types generate entries into this table. To identify the source of the transaction, I have created columns in ar_trans_due called ‘tran_type’ and ‘tran_row_id’. After inserting a row into ‘ar_tran_inv’, I invoke this -
 
  INSERT INTO ar_trans_due (tran_type, tran_row_id, ...) VALUES (‘ar_inv’, ar_tran_inv.row_id, ...), and similar for the other transaction types. It is handled by a Python program, and it all happens within a transaction.
 
When I view a row in ar_trans_due, I want to retrieve data from the source transaction, so I have this -
 
  SELECT * FROM ar_trans_due a
  LEFT JOIN ar_trans b ON b.tran_type = a.tran_type AND b.tran_row_id = a.tran_row_id
 
I understand that PostgreSQL must somehow follow a path from the view ‘ar_trans’ to the physical table ‘ar_tran_inv’, but I assumed it would execute the equivalent of SELECT * FROM ar_tran_inv WHERE row_id = a.tran_row_id AND posted = ‘1’.
 
If this was the case, it would be an indexed read, and very fast. Instead, according to EXPLAIN, it performs a sequential scan of the ‘ar_tran_inv’ table.
 
It also scans ‘ar_tran_crn’ and ‘ar_tran_rec’, but EXPLAIN shows that it uses a Bitmap Heap Scan on those. I assume that is because the tables are currently empty.
 
Is this analysis correct?

please, send EXPLAIN ANALYZE result :) 
 
If so, is there any way to force it to use an indexed read?

set enable_seqscan to off;

Regards

Pavel
 
 
Thanks for any pointers.
 
Frank Millman
 

Re: [GENERAL] a JOIN to a VIEW seems slow

От
"Frank Millman"
Дата:
Pavel Stehule wrote:
 
2017-09-14 10:14 GMT+02:00 Frank Millman <frank@chagford.com>:
Hi all
 
This is a follow-up to a recent question I posted regarding a slow query. I thought that the slowness was caused by the number of JOINs in the query, but with your assistance I have found the true reason. I said in the previous thread that the question had become academic, but now that I understand things better, it is no longer academic as it casts doubt on my whole approach.
 
I have split my AR transaction table into three physical tables – ar_tran_inv, ar_tran_crn, ar_tran_rec. I will probably add others at some point, such as ar_tran_jnl.
 
I then create a VIEW to view all transactions combined. The view is created like this -
 
CREATE VIEW ar_trans AS
  SELECT ‘ar_inv’ AS tran_type, row_id AS tran_row_id, tran_number ... FROM ar_tran_inv WHERE posted = ‘1’
  UNION ALL
  SELECT ‘ar_crn’ AS tran_type, row_id AS tran_row_id, tran_number ... FROM ar_tran_crn WHERE posted = ‘1’
  UNION ALL
  SELECT ‘ar_rec’ AS tran_type, row_id AS tran_row_id, tran_number ... FROM ar_tran_rec WHERE posted = ‘1’
 
I have another table called ‘ar_trans_due’, to keep track of outstanding transactions. All of the three transaction types generate entries into this table. To identify the source of the transaction, I have created columns in ar_trans_due called ‘tran_type’ and ‘tran_row_id’. After inserting a row into ‘ar_tran_inv’, I invoke this -
 
  INSERT INTO ar_trans_due (tran_type, tran_row_id, ...) VALUES (‘ar_inv’, ar_tran_inv.row_id, ...), and similar for the other transaction types. It is handled by a Python program, and it all happens within a transaction.
 
When I view a row in ar_trans_due, I want to retrieve data from the source transaction, so I have this -
 
  SELECT * FROM ar_trans_due a
  LEFT JOIN ar_trans b ON b.tran_type = a.tran_type AND b.tran_row_id = a.tran_row_id
 
I understand that PostgreSQL must somehow follow a path from the view ‘ar_trans’ to the physical table ‘ar_tran_inv’, but I assumed it would execute the equivalent of SELECT * FROM ar_tran_inv WHERE row_id = a.tran_row_id AND posted = ‘1’.
 
If this was the case, it would be an indexed read, and very fast. Instead, according to EXPLAIN, it performs a sequential scan of the ‘ar_tran_inv’ table.
 
It also scans ‘ar_tran_crn’ and ‘ar_tran_rec’, but EXPLAIN shows that it uses a Bitmap Heap Scan on those. I assume that is because the tables are currently empty.
 
Is this analysis correct?
 
please, send EXPLAIN ANALYZE result :)
 
 
I tried to reduce this to its simplest form.
 
Here is a SQL statement -
 
    SELECT *
    FROM ccc.ar_trans_due a
    LEFT JOIN ccc.ar_trans b ON
        b.tran_type = a.tran_type AND b.tran_row_id = a.tran_row_id
    WHERE a.row_id = 1
 
ar_trans_due is a physical table, ar_trans is a view.
 
It takes about 28ms. Here is the explain - https://explain.depesz.com/s/8YY
 
Then I changed it to join each of the physical tables, instead of the view -
 
    SELECT *
    FROM ccc.ar_trans_due a
    LEFT JOIN ccc.ar_tran_inv b ON
        b.row_id = a.tran_row_id
    LEFT JOIN ccc.ar_tran_crn c ON
        c.row_id = a.tran_row_id
    LEFT JOIN ccc.ar_tran_rec d ON
        d.row_id = a.tran_row_id
    WHERE a.row_id = 1
 
This takes just over 1ms. Here is the explain - https://explain.depesz.com/s/U29h
 
I tried setting enable_seq_scan to off – it ran even slower!
 
Frank
 

Re: [GENERAL] a JOIN to a VIEW seems slow

От
Pavel Stehule
Дата:


2017-09-14 14:59 GMT+02:00 Frank Millman <frank@chagford.com>:
Pavel Stehule wrote:
 
2017-09-14 10:14 GMT+02:00 Frank Millman <frank@chagford.com>:
Hi all
 
This is a follow-up to a recent question I posted regarding a slow query. I thought that the slowness was caused by the number of JOINs in the query, but with your assistance I have found the true reason. I said in the previous thread that the question had become academic, but now that I understand things better, it is no longer academic as it casts doubt on my whole approach.
 
I have split my AR transaction table into three physical tables – ar_tran_inv, ar_tran_crn, ar_tran_rec. I will probably add others at some point, such as ar_tran_jnl.
 
I then create a VIEW to view all transactions combined. The view is created like this -
 
CREATE VIEW ar_trans AS
  SELECT ‘ar_inv’ AS tran_type, row_id AS tran_row_id, tran_number ... FROM ar_tran_inv WHERE posted = ‘1’
  UNION ALL
  SELECT ‘ar_crn’ AS tran_type, row_id AS tran_row_id, tran_number ... FROM ar_tran_crn WHERE posted = ‘1’
  UNION ALL
  SELECT ‘ar_rec’ AS tran_type, row_id AS tran_row_id, tran_number ... FROM ar_tran_rec WHERE posted = ‘1’
 
I have another table called ‘ar_trans_due’, to keep track of outstanding transactions. All of the three transaction types generate entries into this table. To identify the source of the transaction, I have created columns in ar_trans_due called ‘tran_type’ and ‘tran_row_id’. After inserting a row into ‘ar_tran_inv’, I invoke this -
 
  INSERT INTO ar_trans_due (tran_type, tran_row_id, ...) VALUES (‘ar_inv’, ar_tran_inv.row_id, ...), and similar for the other transaction types. It is handled by a Python program, and it all happens within a transaction.
 
When I view a row in ar_trans_due, I want to retrieve data from the source transaction, so I have this -
 
  SELECT * FROM ar_trans_due a
  LEFT JOIN ar_trans b ON b.tran_type = a.tran_type AND b.tran_row_id = a.tran_row_id
 
I understand that PostgreSQL must somehow follow a path from the view ‘ar_trans’ to the physical table ‘ar_tran_inv’, but I assumed it would execute the equivalent of SELECT * FROM ar_tran_inv WHERE row_id = a.tran_row_id AND posted = ‘1’.
 
If this was the case, it would be an indexed read, and very fast. Instead, according to EXPLAIN, it performs a sequential scan of the ‘ar_tran_inv’ table.
 
It also scans ‘ar_tran_crn’ and ‘ar_tran_rec’, but EXPLAIN shows that it uses a Bitmap Heap Scan on those. I assume that is because the tables are currently empty.
 
Is this analysis correct?
 
please, send EXPLAIN ANALYZE result :)
 
 
I tried to reduce this to its simplest form.
 
Here is a SQL statement -
 
    SELECT *
    FROM ccc.ar_trans_due a
    LEFT JOIN ccc.ar_trans b ON
        b.tran_type = a.tran_type AND b.tran_row_id = a.tran_row_id
    WHERE a.row_id = 1
 
ar_trans_due is a physical table, ar_trans is a view.
 
It takes about 28ms. Here is the explain - https://explain.depesz.com/s/8YY

 
Then I changed it to join each of the physical tables, instead of the view -
 
    SELECT *
    FROM ccc.ar_trans_due a
    LEFT JOIN ccc.ar_tran_inv b ON
        b.row_id = a.tran_row_id
    LEFT JOIN ccc.ar_tran_crn c ON
        c.row_id = a.tran_row_id
    LEFT JOIN ccc.ar_tran_rec d ON
        d.row_id = a.tran_row_id
    WHERE a.row_id = 1
 
This takes just over 1ms. Here is the explain - https://explain.depesz.com/s/U29h
 
I tried setting enable_seq_scan to off – it ran even slower!
 
Frank
 

Re: [GENERAL] a JOIN to a VIEW seems slow

От
Pavel Stehule
Дата:


2017-09-14 15:09 GMT+02:00 Pavel Stehule <pavel.stehule@gmail.com>:


2017-09-14 14:59 GMT+02:00 Frank Millman <frank@chagford.com>:
Pavel Stehule wrote:
 
2017-09-14 10:14 GMT+02:00 Frank Millman <frank@chagford.com>:
Hi all
 
This is a follow-up to a recent question I posted regarding a slow query. I thought that the slowness was caused by the number of JOINs in the query, but with your assistance I have found the true reason. I said in the previous thread that the question had become academic, but now that I understand things better, it is no longer academic as it casts doubt on my whole approach.
 
I have split my AR transaction table into three physical tables – ar_tran_inv, ar_tran_crn, ar_tran_rec. I will probably add others at some point, such as ar_tran_jnl.
 
I then create a VIEW to view all transactions combined. The view is created like this -
 
CREATE VIEW ar_trans AS
  SELECT ‘ar_inv’ AS tran_type, row_id AS tran_row_id, tran_number ... FROM ar_tran_inv WHERE posted = ‘1’
  UNION ALL
  SELECT ‘ar_crn’ AS tran_type, row_id AS tran_row_id, tran_number ... FROM ar_tran_crn WHERE posted = ‘1’
  UNION ALL
  SELECT ‘ar_rec’ AS tran_type, row_id AS tran_row_id, tran_number ... FROM ar_tran_rec WHERE posted = ‘1’
 
I have another table called ‘ar_trans_due’, to keep track of outstanding transactions. All of the three transaction types generate entries into this table. To identify the source of the transaction, I have created columns in ar_trans_due called ‘tran_type’ and ‘tran_row_id’. After inserting a row into ‘ar_tran_inv’, I invoke this -
 
  INSERT INTO ar_trans_due (tran_type, tran_row_id, ...) VALUES (‘ar_inv’, ar_tran_inv.row_id, ...), and similar for the other transaction types. It is handled by a Python program, and it all happens within a transaction.
 
When I view a row in ar_trans_due, I want to retrieve data from the source transaction, so I have this -
 
  SELECT * FROM ar_trans_due a
  LEFT JOIN ar_trans b ON b.tran_type = a.tran_type AND b.tran_row_id = a.tran_row_id
 
I understand that PostgreSQL must somehow follow a path from the view ‘ar_trans’ to the physical table ‘ar_tran_inv’, but I assumed it would execute the equivalent of SELECT * FROM ar_tran_inv WHERE row_id = a.tran_row_id AND posted = ‘1’.
 
If this was the case, it would be an indexed read, and very fast. Instead, according to EXPLAIN, it performs a sequential scan of the ‘ar_tran_inv’ table.
 
It also scans ‘ar_tran_crn’ and ‘ar_tran_rec’, but EXPLAIN shows that it uses a Bitmap Heap Scan on those. I assume that is because the tables are currently empty.
 
Is this analysis correct?
 
please, send EXPLAIN ANALYZE result :)
 
 
I tried to reduce this to its simplest form.
 
Here is a SQL statement -
 
    SELECT *
    FROM ccc.ar_trans_due a
    LEFT JOIN ccc.ar_trans b ON
        b.tran_type = a.tran_type AND b.tran_row_id = a.tran_row_id
    WHERE a.row_id = 1
 
ar_trans_due is a physical table, ar_trans is a view.
 
It takes about 28ms. Here is the explain - https://explain.depesz.com/s/8YY
 

The PostgreSQL cannot to push join - in slow case, the UNIONS should be done first - and it requires full scan ar_tran_inv - used filter (posted AND (deleted_id = 0) is not too effective - maybe some composite or partial index helps.

The fast query doesn't contains unions - so there are bigger space for optimizer - ar_tran_inv is filtered effective - by primary key. 

So main problem is impossible to push information a.row_id = 1 to deep to query.


 
Then I changed it to join each of the physical tables, instead of the view -
 
    SELECT *
    FROM ccc.ar_trans_due a
    LEFT JOIN ccc.ar_tran_inv b ON
        b.row_id = a.tran_row_id
    LEFT JOIN ccc.ar_tran_crn c ON
        c.row_id = a.tran_row_id
    LEFT JOIN ccc.ar_tran_rec d ON
        d.row_id = a.tran_row_id
    WHERE a.row_id = 1
 
This takes just over 1ms. Here is the explain - https://explain.depesz.com/s/U29h
 
I tried setting enable_seq_scan to off – it ran even slower!
 
Frank
 


Re: [GENERAL] a JOIN to a VIEW seems slow

От
"Frank Millman"
Дата:
 
2017-09-14 15:09 GMT+02:00 Pavel Stehule <pavel.stehule@gmail.com>:
 
 
2017-09-14 14:59 GMT+02:00 Frank Millman <frank@chagford.com>:
Pavel Stehule wrote:
 
2017-09-14 10:14 GMT+02:00 Frank Millman <frank@chagford.com>:
Hi all
 
This is a follow-up to a recent question I posted regarding a slow query. I thought that the slowness was caused by the number of JOINs in the query, but with your assistance I have found the true reason. I said in the previous thread that the question had become academic, but now that I understand things better, it is no longer academic as it casts doubt on my whole approach.
 
I have split my AR transaction table into three physical tables – ar_tran_inv, ar_tran_crn, ar_tran_rec. I will probably add others at some point, such as ar_tran_jnl.
 
I then create a VIEW to view all transactions combined. The view is created like this -
 
CREATE VIEW ar_trans AS
  SELECT ‘ar_inv’ AS tran_type, row_id AS tran_row_id, tran_number ... FROM ar_tran_inv WHERE posted = ‘1’
  UNION ALL
  SELECT ‘ar_crn’ AS tran_type, row_id AS tran_row_id, tran_number ... FROM ar_tran_crn WHERE posted = ‘1’
  UNION ALL
  SELECT ‘ar_rec’ AS tran_type, row_id AS tran_row_id, tran_number ... FROM ar_tran_rec WHERE posted = ‘1’
 
I have another table called ‘ar_trans_due’, to keep track of outstanding transactions. All of the three transaction types generate entries into this table. To identify the source of the transaction, I have created columns in ar_trans_due called ‘tran_type’ and ‘tran_row_id’. After inserting a row into ‘ar_tran_inv’, I invoke this -
 
  INSERT INTO ar_trans_due (tran_type, tran_row_id, ...) VALUES (‘ar_inv’, ar_tran_inv.row_id, ...), and similar for the other transaction types. It is handled by a Python program, and it all happens within a transaction.
 
When I view a row in ar_trans_due, I want to retrieve data from the source transaction, so I have this -
 
  SELECT * FROM ar_trans_due a
  LEFT JOIN ar_trans b ON b.tran_type = a.tran_type AND b.tran_row_id = a.tran_row_id
 
I understand that PostgreSQL must somehow follow a path from the view ‘ar_trans’ to the physical table ‘ar_tran_inv’, but I assumed it would execute the equivalent of SELECT * FROM ar_tran_inv WHERE row_id = a.tran_row_id AND posted = ‘1’.
 
If this was the case, it would be an indexed read, and very fast. Instead, according to EXPLAIN, it performs a sequential scan of the ‘ar_tran_inv’ table.
 
It also scans ‘ar_tran_crn’ and ‘ar_tran_rec’, but EXPLAIN shows that it uses a Bitmap Heap Scan on those. I assume that is because the tables are currently empty.
 
Is this analysis correct?
 
please, send EXPLAIN ANALYZE result :)
 
 
I tried to reduce this to its simplest form.
 
Here is a SQL statement -
 
    SELECT *
    FROM ccc.ar_trans_due a
    LEFT JOIN ccc.ar_trans b ON
        b.tran_type = a.tran_type AND b.tran_row_id = a.tran_row_id
    WHERE a.row_id = 1
 
ar_trans_due is a physical table, ar_trans is a view.
 
It takes about 28ms. Here is the explain - https://explain.depesz.com/s/8YY
 
 
> The PostgreSQL cannot to push join - in slow case, the UNIONS should be done first - and it requires full scan ar_tran_inv - used filter (posted AND (deleted_id = 0) is not too effective - maybe some composite or partial index helps.
>
> The fast query doesn't contains unions - so there are bigger space for optimizer - ar_tran_inv is filtered effective - by primary key.
>
> So main problem is impossible to push information a.row_id = 1 to deep to query.
>
 
Sorry for banging on about this, but someone might be interested in the following timings.
 
The only solution I could find was to ‘denormalise’ (if that is a word) and create additional columns on ar_trans_due for cust_row_id and tran_date, to avoid using any joins.
 
Once I had done that, I could run my query two ways –
  1. using the newly created columns
  2. as before, using a join to the view, which in turn retrieved data from the underlying tables.
 
This was a more complex query than the example above – details available on request.
 
Here are the timings for running the query on identical data sets using Postgresql, Sql Server, and Sqlite3 -
 
PostgreSQL -
    Method 1 - 0.28 sec
    Method 2 – 1607 sec, or 26 minutes
 
Sql Server -
    Method 1 – 0.33 sec
    Method 2 – 1.8 sec
 
Sqlite3 -
    Method 1 – 0.15 sec
    Method 2 – 1.0 sec
 
It seems that Sql Server and Sqlite3 are able to analyse the ‘join’, and execute an indexed read against the underlying physical tables.
 
Frank
 

Re: [GENERAL] a JOIN to a VIEW seems slow

От
Merlin Moncure
Дата:
On Thu, Sep 14, 2017 at 8:17 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> 2017-09-14 15:09 GMT+02:00 Pavel Stehule <pavel.stehule@gmail.com>:
>>
>>
>>
>> 2017-09-14 14:59 GMT+02:00 Frank Millman <frank@chagford.com>:
>>>
>>> Pavel Stehule wrote:
>>>
>>> 2017-09-14 10:14 GMT+02:00 Frank Millman <frank@chagford.com>:
>>>>
>>>> Hi all
>>>>
>>>> This is a follow-up to a recent question I posted regarding a slow
>>>> query. I thought that the slowness was caused by the number of JOINs in the
>>>> query, but with your assistance I have found the true reason. I said in the
>>>> previous thread that the question had become academic, but now that I
>>>> understand things better, it is no longer academic as it casts doubt on my
>>>> whole approach.
>>>>
>>>> I have split my AR transaction table into three physical tables –
>>>> ar_tran_inv, ar_tran_crn, ar_tran_rec. I will probably add others at some
>>>> point, such as ar_tran_jnl.
>>>>
>>>> I then create a VIEW to view all transactions combined. The view is
>>>> created like this -
>>>>
>>>> CREATE VIEW ar_trans AS
>>>>   SELECT ‘ar_inv’ AS tran_type, row_id AS tran_row_id, tran_number ...
>>>> FROM ar_tran_inv WHERE posted = ‘1’
>>>>   UNION ALL
>>>>   SELECT ‘ar_crn’ AS tran_type, row_id AS tran_row_id, tran_number ...
>>>> FROM ar_tran_crn WHERE posted = ‘1’
>>>>   UNION ALL
>>>>   SELECT ‘ar_rec’ AS tran_type, row_id AS tran_row_id, tran_number ...
>>>> FROM ar_tran_rec WHERE posted = ‘1’
>>>>
>>>> I have another table called ‘ar_trans_due’, to keep track of outstanding
>>>> transactions. All of the three transaction types generate entries into this
>>>> table. To identify the source of the transaction, I have created columns in
>>>> ar_trans_due called ‘tran_type’ and ‘tran_row_id’. After inserting a row
>>>> into ‘ar_tran_inv’, I invoke this -
>>>>
>>>>   INSERT INTO ar_trans_due (tran_type, tran_row_id, ...) VALUES
>>>> (‘ar_inv’, ar_tran_inv.row_id, ...), and similar for the other transaction
>>>> types. It is handled by a Python program, and it all happens within a
>>>> transaction.
>>>>
>>>> When I view a row in ar_trans_due, I want to retrieve data from the
>>>> source transaction, so I have this -
>>>>
>>>>   SELECT * FROM ar_trans_due a
>>>>   LEFT JOIN ar_trans b ON b.tran_type = a.tran_type AND b.tran_row_id =
>>>> a.tran_row_id
>>>>
>>>> I understand that PostgreSQL must somehow follow a path from the view
>>>> ‘ar_trans’ to the physical table ‘ar_tran_inv’, but I assumed it would
>>>> execute the equivalent of SELECT * FROM ar_tran_inv WHERE row_id =
>>>> a.tran_row_id AND posted = ‘1’.
>>>>
>>>> If this was the case, it would be an indexed read, and very fast.
>>>> Instead, according to EXPLAIN, it performs a sequential scan of the
>>>> ‘ar_tran_inv’ table.
>>>>
>>>> It also scans ‘ar_tran_crn’ and ‘ar_tran_rec’, but EXPLAIN shows that it
>>>> uses a Bitmap Heap Scan on those. I assume that is because the tables are
>>>> currently empty.
>>>>
>>>> Is this analysis correct?
>>>
>>>
>>> please, send EXPLAIN ANALYZE result :)
>>>>
>>>>
>>>
>>>
>>> I tried to reduce this to its simplest form.
>>>
>>> Here is a SQL statement -
>>>
>>>     SELECT *
>>>     FROM ccc.ar_trans_due a
>>>     LEFT JOIN ccc.ar_trans b ON
>>>         b.tran_type = a.tran_type AND b.tran_row_id = a.tran_row_id
>>>     WHERE a.row_id = 1
>>>
>>> ar_trans_due is a physical table, ar_trans is a view.
>>>
>>> It takes about 28ms. Here is the explain -
>>> https://explain.depesz.com/s/8YY
>>>
>>>
>
>
> The PostgreSQL cannot to push join - in slow case, the UNIONS should be done
> first - and it requires full scan ar_tran_inv - used filter (posted AND
> (deleted_id = 0) is not too effective - maybe some composite or partial
> index helps.


In my testing JOINS can push through UNION ALL.   Why do we need to
materialize  union first?  What version is this?

merlin


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] a JOIN to a VIEW seems slow

От
"Frank Millman"
Дата:
Merlin Moncure wrote:
On Thu, Sep 14, 2017 at 8:17 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>
> >
> > The PostgreSQL cannot to push join - in slow case, the UNIONS should be done
>
> first - and it requires full scan ar_tran_inv - used filter (posted AND
>
> (deleted_id = 0) is not too effective - maybe some composite or partial
>
> index helps.
>

>
In my testing JOINS can push through UNION ALL.   Why do we need to
>
materialize  union first?  What version is this?
>

I am using version 9.4.4 on Fedora 22.
 
Frank Millman

Re: [GENERAL] a JOIN to a VIEW seems slow

От
"Frank Millman"
Дата:
On 2017-09-18 Frank Millman wrote:
>
> Here are the timings for running the query on identical data sets using Postgresql, Sql Server, and Sqlite3 -
>
> PostgreSQL -
>     Method 1 - 0.28 sec
>     Method 2 – 1607 sec, or 26 minutes
>
> Sql Server -
>     Method 1 – 0.33 sec
>     Method 2 – 1.8 sec
>
> Sqlite3 -
>     Method 1 – 0.15 sec
>     Method 2 – 1.0 sec
>
> It seems that Sql Server and Sqlite3 are able to analyse the ‘join’, and execute an indexed read against the underlying physical tables.
>
 
I did not get any response to this, but I am still persevering, and feel that I am getting closer. Instead of waiting 26 minutes for a result, I realise that I can learn a lot by using EXPLAIN. This is what I have found out.
 
To recap, I have the following tables -
 
1. ‘ar_tran_inv’, to store invoices
2. ‘ar_tran_crn’, to store credit notes
3. ‘ar_tran_rec’ to store receipts
 
This is a subset of their common columns -
    row_id INT SERIAL PRIMARY KEY,
    tran_number VARCHAR,
    posted BOOL,
 
I have created a VIEW called ‘ar_trans’ to combine them -
 
CREATE VIEW ar_trans AS
  SELECT ‘ar_inv’ AS tran_type, row_id AS tran_row_id, tran_number ... FROM ar_tran_inv WHERE posted = ‘1’
  UNION ALL
  SELECT ‘ar_crn’ AS tran_type, row_id AS tran_row_id, tran_number ... FROM ar_tran_crn WHERE posted = ‘1’
  UNION ALL
  SELECT ‘ar_rec’ AS tran_type, row_id AS tran_row_id, tran_number ... FROM ar_tran_rec WHERE posted = ‘1’
 
I have another table called ‘ar_trans_due’, where a row is inserted whenever a row is inserted into any of the three transaction tables. To identify the source transaction, I have the following columns -
    tran_type VARCHAR – can be ‘ar_inv’, ‘ar_crn’, or ‘ar_rec’
    tran_row_id INT – the primary key of the originating transaction
 
Now here are my tests -
 
1. =======================================================
 
SELECT tran_type, tran_row_id FROM ar_trans_due WHERE row_id = 1;
 
tran_type | tran_row_id
-----------+-------------
ar_inv    |           1
(1 row)
 
Just to give me some sample data to work with.
 
2. =======================================================
 
EXPLAIN SELECT * FROM ar_trans WHERE tran_type = ‘ar_inv’ AND tran_row_id = 1;
 
                                        QUERY PLAN                                        
-------------------------------------------------------------------------------------------
Append  (cost=0.29..8.32 rows=1 width=117)
   ->  Index Scan using ar_tran_inv_pkey on ar_tran_inv  (cost=0.29..8.31 rows=1 width=46)
         Index Cond: (row_id = 1)
         Filter: posted
(4 rows)
 
This is a select against the view. It has worked out that the underlying table to use is ‘ar_tran_inv’, and performed an indexed read.
 
3. =======================================================
 
EXPLAIN SELECT * FROM ar_trans_due a LEFT JOIN ar_tran_inv b ON b.row_id = a.tran_row_id where a.row_id = 1;
 
                                          QUERY PLAN                                          
-----------------------------------------------------------------------------------------------
Nested Loop Left Join  (cost=0.58..16.62 rows=1 width=142)
   ->  Index Scan using ar_trans_due_pkey on ar_trans_due a  (cost=0.29..8.31 rows=1 width=52)
         Index Cond: (row_id = 1)
   ->  Index Scan using ar_tran_inv_pkey on ar_tran_inv b  (cost=0.29..8.30 rows=1 width=90)
         Index Cond: (row_id = a.tran_row_id)
(5 rows)
 
Here I have selected a row from ar_trans_due, and joined the underlying physical table directly. It uses an indexed read to perform the join.
 
4. =======================================================
 
EXPLAIN SELECT * FROM ar_trans_due a LEFT JOIN ar_trans b ON b.tran_type = a.tran_type AND b.tran_row_id = a.tran_row_id where a.row_id = 1;
 
                                             QUERY PLAN                                             
-----------------------------------------------------------------------------------------------------
Hash Right Join  (cost=8.32..2072.99 rows=1 width=169)
   Hash Cond: ((('ar_inv'::text) = (a.tran_type)::text) AND (ar_tran_inv.row_id = a.tran_row_id))
   ->  Append  (cost=0.00..1739.47 rows=43357 width=117)
         ->  Seq Scan on ar_tran_inv  (cost=0.00..676.01 rows=21601 width=46)
               Filter: posted
         ->  Seq Scan on ar_tran_crn  (cost=0.00..13.88 rows=155 width=124)
               Filter: posted
         ->  Seq Scan on ar_tran_rec  (cost=0.00..616.01 rows=21601 width=40)
               Filter: posted
   ->  Hash  (cost=8.31..8.31 rows=1 width=52)
         ->  Index Scan using ar_trans_due_pkey on ar_trans_due a  (cost=0.29..8.31 rows=1 width=52)
               Index Cond: (row_id = 1)
(12 rows)
 
Here I have selected the same row, and joined the view ‘ar_trans’. It seems to have all the information necessary to perform an indexed read, but instead it performs a sequential scan of all three of the underlying tables. I don’t know why it shows 155 rows for ar_tran_crn – the table is actually empty.
 
I am using version 9.4.4 on Fedora 22. I don’t want to upgrade just for the sake of it, but if any work has been done in this area for 9.5 or 9.6, that would be the obvious first thing to try.
 
Any suggestions welcomed.
 
Frank Millman
 

Re: [GENERAL] a JOIN to a VIEW seems slow

От
Merlin Moncure
Дата:
On Thu, Sep 21, 2017 at 1:51 AM, Frank Millman <frank@chagford.com> wrote:
> On 2017-09-18 Frank Millman wrote:
>>
>> Here are the timings for running the query on identical data sets using
>> Postgresql, Sql Server, and Sqlite3 -
>>
>> PostgreSQL -
>>     Method 1 - 0.28 sec
>>     Method 2 – 1607 sec, or 26 minutes
>>
>> Sql Server -
>>     Method 1 – 0.33 sec
>>     Method 2 – 1.8 sec
>>
>> Sqlite3 -
>>     Method 1 – 0.15 sec
>>     Method 2 – 1.0 sec
>>
>> It seems that Sql Server and Sqlite3 are able to analyse the ‘join’, and
>> execute an indexed read against the underlying physical tables.
>>
>
> I did not get any response to this, but I am still persevering, and feel
> that I am getting closer. Instead of waiting 26 minutes for a result, I
> realise that I can learn a lot by using EXPLAIN. This is what I have found
> out.

Something is not adding up here.  Can you EXPLAIN ANALYZE the 26 minute query?

merlin


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] a JOIN to a VIEW seems slow

От
"Frank Millman"
Дата:
On Thu, Sep 21, 2017 at 3:54 PM, Merlin Moncure wrote:
>
> > On Thu, Sep 21, 2017 at 1:51 AM, Frank Millman <frank@chagford.com> wrote:
> >
>
> I did not get any response to this, but I am still persevering, and feel
>
> that I am getting closer. Instead of waiting 26 minutes for a result, I
>
> realise that I can learn a lot by using EXPLAIN. This is what I have found
>
> out.
>

 
>
Something is not adding up here.  Can you EXPLAIN ANALYZE the 26 minute query?
>

 
I’m working on it, but my machine is playing up and it is getting late, so I will try again tomorrow.
 
A passing comment – the 26 minute query is more complex, so will need some explaining (no pun intended). I was hoping that my simplified example would illustrate what I think is the problem.
 
Anyway, here is the query -
 
SELECT q.cust_row_id,
SUM(CASE WHEN q.tran_date > '2015-08-31' THEN q.balance ELSE 0 END
    ) AS "balance_curr AS [DECTEXT]",
SUM(CASE WHEN q.tran_date <= '2015-08-31' AND q.tran_date > '2015-07-31' THEN q.balance ELSE 0 END
    ) AS "balance_30 AS [DECTEXT]",
SUM(CASE WHEN q.tran_date <= '2015-07-31' AND q.tran_date > '2015-06-30' THEN q.balance ELSE 0 END
    ) AS "balance_60 AS [DECTEXT]",
SUM(CASE WHEN q.tran_date <= '2015-06-30' AND q.tran_date > '2015-05-31' THEN q.balance ELSE 0 END
    ) AS "balance_90 AS [DECTEXT]",
SUM(CASE WHEN q.tran_date <= '2015-05-31' THEN q.balance ELSE 0 END
    ) AS "balance_120 AS [DECTEXT]"
FROM
(SELECT
    due_trans.cust_row_id,
    due_trans.tran_date,
    trans_due.amount_cust +
        COALESCE((SELECT SUM(trans_alloc.alloc_cust+trans_alloc.disc_cust)
            FROM prop.ar_trans_alloc trans_alloc
            LEFT JOIN prop.ar_trans alloc_trans ON
                alloc_trans.tran_type = trans_alloc.tran_type
                AND alloc_trans.tran_row_id = trans_alloc.tran_row_id
            WHERE trans_alloc.due_row_id = trans_due.row_id
                AND alloc_trans.tran_date <= '2015-09-30'
            ), 0)
    AS balance
    FROM prop.ar_trans_due trans_due
    LEFT JOIN prop.ar_trans due_trans ON
        due_trans.tran_type = trans_due.tran_type
        AND due_trans.tran_row_id = trans_due.tran_row_id
    WHERE due_trans.tran_date <= '2015-09-30'
) AS q
GROUP BY q.cust_row_id
ORDER BY q.cust_row_id;
 
I will report back with the EXPLAIN ANALYSE tomorrow.
 
Frank
 

Re: [GENERAL] a JOIN to a VIEW seems slow

От
Merlin Moncure
Дата:
On Thu, Sep 21, 2017 at 12:02 PM, Frank Millman <frank@chagford.com> wrote:
> On Thu, Sep 21, 2017 at 3:54 PM, Merlin Moncure wrote:
>>
>> > On Thu, Sep 21, 2017 at 1:51 AM, Frank Millman <frank@chagford.com>
>> > wrote:
>> >
>>
>> I did not get any response to this, but I am still persevering, and feel
>>
>> that I am getting closer. Instead of waiting 26 minutes for a result, I
>>
>> realise that I can learn a lot by using EXPLAIN. This is what I have found
>>
>> out.
>>
>
>
>>
> Something is not adding up here.  Can you EXPLAIN ANALYZE the 26 minute
> query?
>>
>
>
> I’m working on it, but my machine is playing up and it is getting late, so I
> will try again tomorrow.
>
> A passing comment – the 26 minute query is more complex, so will need some
> explaining (no pun intended). I was hoping that my simplified example would
> illustrate what I think is the problem.
>
> Anyway, here is the query -
>
> SELECT q.cust_row_id,
> SUM(CASE WHEN q.tran_date > '2015-08-31' THEN q.balance ELSE 0 END
>     ) AS "balance_curr AS [DECTEXT]",
> SUM(CASE WHEN q.tran_date <= '2015-08-31' AND q.tran_date > '2015-07-31'
> THEN q.balance ELSE 0 END
>     ) AS "balance_30 AS [DECTEXT]",
> SUM(CASE WHEN q.tran_date <= '2015-07-31' AND q.tran_date > '2015-06-30'
> THEN q.balance ELSE 0 END
>     ) AS "balance_60 AS [DECTEXT]",
> SUM(CASE WHEN q.tran_date <= '2015-06-30' AND q.tran_date > '2015-05-31'
> THEN q.balance ELSE 0 END
>     ) AS "balance_90 AS [DECTEXT]",
> SUM(CASE WHEN q.tran_date <= '2015-05-31' THEN q.balance ELSE 0 END
>     ) AS "balance_120 AS [DECTEXT]"
> FROM
> (SELECT
>     due_trans.cust_row_id,
>     due_trans.tran_date,
>     trans_due.amount_cust +
>         COALESCE((SELECT SUM(trans_alloc.alloc_cust+trans_alloc.disc_cust)
>             FROM prop.ar_trans_alloc trans_alloc
>             LEFT JOIN prop.ar_trans alloc_trans ON
>                 alloc_trans.tran_type = trans_alloc.tran_type
>                 AND alloc_trans.tran_row_id = trans_alloc.tran_row_id
>             WHERE trans_alloc.due_row_id = trans_due.row_id
>                 AND alloc_trans.tran_date <= '2015-09-30'
>             ), 0)
>     AS balance
>     FROM prop.ar_trans_due trans_due
>     LEFT JOIN prop.ar_trans due_trans ON
>         due_trans.tran_type = trans_due.tran_type
>         AND due_trans.tran_row_id = trans_due.tran_row_id
>     WHERE due_trans.tran_date <= '2015-09-30'
> ) AS q
> GROUP BY q.cust_row_id
> ORDER BY q.cust_row_id;
>
> I will report back with the EXPLAIN ANALYSE tomorrow.

Thank you.  It's a minor point, but take a look at the FILTER syntax here:
https://www.postgresql.org/docs/9.4/static/sql-expressions.html#SYNTAX-AGGREGATES

For example,
> SUM(CASE WHEN q.tran_date > '2015-08-31' THEN q.balance ELSE 0 END
>     ) AS "balance_curr AS [DECTEXT]",

Could be rewritten as:
SUM(q.balance) FILTER (WHERE  q.tran_date > '2015-08-31') AS "balance_curr AS [DECTEXT]",

Besides being more terse and clear, FILTER expressions in my
experience tend to be somewhat faster than aggregations over CASE
statements.

merlin


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] a JOIN to a VIEW seems slow

От
"Frank Millman"
Дата:
 
On Thu, Sep 21, 2017 at 3:54 PM, Merlin Moncure wrote:
> Something is not adding up here.  Can you EXPLAIN ANALYZE the 26 minute
> query?
Here it is -
 
 
Frank
 

Re: [GENERAL] a JOIN to a VIEW seems slow

От
Merlin Moncure
Дата:
On Thu, Sep 21, 2017 at 12:02 PM, Frank Millman <frank@chagford.com> wrote:
> On Thu, Sep 21, 2017 at 3:54 PM, Merlin Moncure wrote:
>>
>> > On Thu, Sep 21, 2017 at 1:51 AM, Frank Millman <frank@chagford.com>
>> > wrote:
>> >
>>
>> I did not get any response to this, but I am still persevering, and feel
>>
>> that I am getting closer. Instead of waiting 26 minutes for a result, I
>>
>> realise that I can learn a lot by using EXPLAIN. This is what I have found
>>
>> out.
>>
>
>
>>
> Something is not adding up here.  Can you EXPLAIN ANALYZE the 26 minute
> query?
>>
>
>
> I’m working on it, but my machine is playing up and it is getting late, so I
> will try again tomorrow.
>
> A passing comment – the 26 minute query is more complex, so will need some
> explaining (no pun intended). I was hoping that my simplified example would
> illustrate what I think is the problem.
>
> Anyway, here is the query -
>
> SELECT q.cust_row_id,
> SUM(CASE WHEN q.tran_date > '2015-08-31' THEN q.balance ELSE 0 END
>     ) AS "balance_curr AS [DECTEXT]",
> SUM(CASE WHEN q.tran_date <= '2015-08-31' AND q.tran_date > '2015-07-31'
> THEN q.balance ELSE 0 END
>     ) AS "balance_30 AS [DECTEXT]",
> SUM(CASE WHEN q.tran_date <= '2015-07-31' AND q.tran_date > '2015-06-30'
> THEN q.balance ELSE 0 END
>     ) AS "balance_60 AS [DECTEXT]",
> SUM(CASE WHEN q.tran_date <= '2015-06-30' AND q.tran_date > '2015-05-31'
> THEN q.balance ELSE 0 END
>     ) AS "balance_90 AS [DECTEXT]",
> SUM(CASE WHEN q.tran_date <= '2015-05-31' THEN q.balance ELSE 0 END
>     ) AS "balance_120 AS [DECTEXT]"
> FROM
> (SELECT
>     due_trans.cust_row_id,
>     due_trans.tran_date,
>     trans_due.amount_cust +
>         COALESCE((SELECT SUM(trans_alloc.alloc_cust+trans_alloc.disc_cust)
>             FROM prop.ar_trans_alloc trans_alloc
>             LEFT JOIN prop.ar_trans alloc_trans ON
>                 alloc_trans.tran_type = trans_alloc.tran_type
>                 AND alloc_trans.tran_row_id = trans_alloc.tran_row_id
>             WHERE trans_alloc.due_row_id = trans_due.row_id
>                 AND alloc_trans.tran_date <= '2015-09-30'
>             ), 0)
>     AS balance
>     FROM prop.ar_trans_due trans_due
>     LEFT JOIN prop.ar_trans due_trans ON
>         due_trans.tran_type = trans_due.tran_type
>         AND due_trans.tran_row_id = trans_due.tran_row_id
>     WHERE due_trans.tran_date <= '2015-09-30'
> ) AS q
> GROUP BY q.cust_row_id
> ORDER BY q.cust_row_id;


What is the performance with this portion simplified out?

>         COALESCE((SELECT SUM(trans_alloc.alloc_cust+trans_alloc.disc_cust)
>             FROM prop.ar_trans_alloc trans_alloc
>             LEFT JOIN prop.ar_trans alloc_trans ON
>                 alloc_trans.tran_type = trans_alloc.tran_type
>                 AND alloc_trans.tran_row_id = trans_alloc.tran_row_id
>             WHERE trans_alloc.due_row_id = trans_due.row_id
>                 AND alloc_trans.tran_date <= '2015-09-30'
>             ), 0)

Change that to just '0' and rerun the query.  If timings are good, I
think we want to explore converting this to LATERAL type join.  I
think (but am not sure) this is defeating the optimizer.   Also, is
this the actual query you want to run quickly?  You are not filtering
on cust_row_id?

merlin


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] a JOIN to a VIEW seems slow

От
"Frank Millman"
Дата:
Merlin Moncure wrote:
 
On Thu, Sep 21, 2017 at 12:02 PM, Frank Millman <frank@chagford.com> wrote:
>
> SELECT q.cust_row_id,
> SUM(CASE WHEN q.tran_date > '2015-08-31' THEN q.balance ELSE 0 END
>     ) AS "balance_curr AS [DECTEXT]",
> SUM(CASE WHEN q.tran_date <= '2015-08-31' AND q.tran_date > '2015-07-31'
> THEN q.balance ELSE 0 END
>     ) AS "balance_30 AS [DECTEXT]",
> SUM(CASE WHEN q.tran_date <= '2015-07-31' AND q.tran_date > '2015-06-30'
> THEN q.balance ELSE 0 END
>     ) AS "balance_60 AS [DECTEXT]",
> SUM(CASE WHEN q.tran_date <= '2015-06-30' AND q.tran_date > '2015-05-31'
> THEN q.balance ELSE 0 END
>     ) AS "balance_90 AS [DECTEXT]",
> SUM(CASE WHEN q.tran_date <= '2015-05-31' THEN q.balance ELSE 0 END
>     ) AS "balance_120 AS [DECTEXT]"
> FROM
> (SELECT
>     due_trans.cust_row_id,
>     due_trans.tran_date,
>     trans_due.amount_cust +
>         COALESCE((SELECT SUM(trans_alloc.alloc_cust+trans_alloc.disc_cust)
>             FROM prop.ar_trans_alloc trans_alloc
>             LEFT JOIN prop.ar_trans alloc_trans ON
>                 alloc_trans.tran_type = trans_alloc.tran_type
>                 AND alloc_trans.tran_row_id = trans_alloc.tran_row_id
>             WHERE trans_alloc.due_row_id = trans_due.row_id
>                 AND alloc_trans.tran_date <= '2015-09-30'
>             ), 0)
>     AS balance
>     FROM prop.ar_trans_due trans_due
>     LEFT JOIN prop.ar_trans due_trans ON
>         due_trans.tran_type = trans_due.tran_type
>         AND due_trans.tran_row_id = trans_due.tran_row_id
>     WHERE due_trans.tran_date <= '2015-09-30'
> ) AS q
> GROUP BY q.cust_row_id
> ORDER BY q.cust_row_id;


> What is the performance with this portion simplified out?

>         COALESCE((SELECT SUM(trans_alloc.alloc_cust+trans_alloc.disc_cust)
>             FROM prop.ar_trans_alloc trans_alloc
>             LEFT JOIN prop.ar_trans alloc_trans ON
>                 alloc_trans.tran_type = trans_alloc.tran_type
>                 AND alloc_trans.tran_row_id = trans_alloc.tran_row_id
>             WHERE trans_alloc.due_row_id = trans_due.row_id
>                 AND alloc_trans.tran_date <= '2015-09-30'
>             ), 0)

> Change that to just '0' and rerun the query.  If timings are good, I
> think we want to explore converting this to LATERAL type join.  I
> think (but am not sure) this is defeating the optimizer.   Also, is
> this the actual query you want to run quickly?  You are not filtering
> on cust_row_id?
 
It makes a big difference – the query runs in 0.18 seconds.
 
This query can be used to return the age analysis for a single debtor or for all debtors, so yes I would sometimes run it without filtering.
 
A couple of comments -
 
1. My app supports 3 databases, PostgreSQL, SqlServer, and SQLite3. I am trying to keep my SQL as generic as possible. However, if I have to use something that is PostgreSQL-specific, I may have to live with that.
 
2. This is probably irrelevant but here is the query plan that SQLite3 creates -
 
3|0|0|SCAN TABLE ar_tran_inv
4|0|0|SCAN TABLE ar_tran_crn
2|0|0|COMPOUND SUBQUERIES 3 AND 4 (UNION ALL)
5|0|0|SCAN TABLE ar_tran_rec
1|0|0|COMPOUND SUBQUERIES 2 AND 5 (UNION ALL)
0|0|0|SCAN TABLE ar_trans_due AS trans_due
0|1|1|SEARCH SUBQUERY 1 AS due_trans USING AUTOMATIC COVERING INDEX (tran_type=? AND tran_row_id=?)
0|0|0|USE TEMP B-TREE FOR GROUP BY
0|0|0|EXECUTE CORRELATED SCALAR SUBQUERY 6
9|0|0|SCAN TABLE ar_tran_inv
10|0|0|SCAN TABLE ar_tran_crn
8|0|0|COMPOUND SUBQUERIES 9 AND 10 (UNION ALL)
11|0|0|SCAN TABLE ar_tran_rec
7|0|0|COMPOUND SUBQUERIES 8 AND 11 (UNION ALL)
6|0|0|SEARCH TABLE ar_trans_alloc AS trans_alloc USING INDEX _ar_trans_alloc_due (due_row_id=?)
6|1|1|SEARCH SUBQUERY 7 AS alloc_trans USING AUTOMATIC COVERING INDEX (tran_type=? AND tran_row_id=?)
0|0|0|EXECUTE CORRELATED SCALAR SUBQUERY 12
15|0|0|SCAN TABLE ar_tran_inv
16|0|0|SCAN TABLE ar_tran_crn
14|0|0|COMPOUND SUBQUERIES 15 AND 16 (UNION ALL)
17|0|0|SCAN TABLE ar_tran_rec
13|0|0|COMPOUND SUBQUERIES 14 AND 17 (UNION ALL)
12|0|0|SEARCH TABLE ar_trans_alloc AS trans_alloc USING INDEX _ar_trans_alloc_due (due_row_id=?)
12|1|1|SEARCH SUBQUERY 13 AS alloc_trans USING AUTOMATIC COVERING INDEX (tran_type=? AND tran_row_id=?)
0|0|0|EXECUTE CORRELATED SCALAR SUBQUERY 18
21|0|0|SCAN TABLE ar_tran_inv
22|0|0|SCAN TABLE ar_tran_crn
20|0|0|COMPOUND SUBQUERIES 21 AND 22 (UNION ALL)
23|0|0|SCAN TABLE ar_tran_rec
19|0|0|COMPOUND SUBQUERIES 20 AND 23 (UNION ALL)
18|0|0|SEARCH TABLE ar_trans_alloc AS trans_alloc USING INDEX _ar_trans_alloc_due (due_row_id=?)
18|1|1|SEARCH SUBQUERY 19 AS alloc_trans USING AUTOMATIC COVERING INDEX (tran_type=? AND tran_row_id=?)
0|0|0|EXECUTE CORRELATED SCALAR SUBQUERY 24
27|0|0|SCAN TABLE ar_tran_inv
28|0|0|SCAN TABLE ar_tran_crn
26|0|0|COMPOUND SUBQUERIES 27 AND 28 (UNION ALL)
29|0|0|SCAN TABLE ar_tran_rec
25|0|0|COMPOUND SUBQUERIES 26 AND 29 (UNION ALL)
24|0|0|SEARCH TABLE ar_trans_alloc AS trans_alloc USING INDEX _ar_trans_alloc_due (due_row_id=?)
24|1|1|SEARCH SUBQUERY 25 AS alloc_trans USING AUTOMATIC COVERING INDEX (tran_type=? AND tran_row_id=?)
0|0|0|EXECUTE CORRELATED SCALAR SUBQUERY 30
33|0|0|SCAN TABLE ar_tran_inv
34|0|0|SCAN TABLE ar_tran_crn
32|0|0|COMPOUND SUBQUERIES 33 AND 34 (UNION ALL)
35|0|0|SCAN TABLE ar_tran_rec
31|0|0|COMPOUND SUBQUERIES 32 AND 35 (UNION ALL)
30|0|0|SEARCH TABLE ar_trans_alloc AS trans_alloc USING INDEX _ar_trans_alloc_due (due_row_id=?)
30|1|1|SEARCH SUBQUERY 31 AS alloc_trans USING AUTOMATIC COVERING INDEX (tran_type=? AND tran_row_id=?)
 
I *think* that the important line is the last one (repeated elsewhere in the plan as well) – when joining alloc_trans, it uses an index on tran_type and tran_row_id. This seems to be what PostgreSQL is not doing.
 
Frank
 

Re: [GENERAL] a JOIN to a VIEW seems slow

От
"Frank Millman"
Дата:
Frank Millman wrote:
>  
>  On Thu, Sep 21, 2017 at 3:54 PM, Merlin Moncure wrote:
>  > Something is not adding up here.  Can you EXPLAIN ANALYZE the 26 minute
>  > query?
>  Here it is -
>  
>  
 
There is one thing I have not mentioned. I am pretty sure it has no effect on the outcome, but just in case, here it is.
 
The underlying physical tables, ‘ar_tran_inv’, ‘ar_tran_crn’, and ‘ar_tran_rec’, have this index declared -
 
    CREATE UNIQUE INDEX _ar_tran_inv ON ar_tran_inv (tran_number) WHERE deleted_id = 0;
 
and similar for the other two tables.
 
I have not included the column ‘deleted_id’ in the VIEW ‘ar_trans’, so I cannot add ‘WHERE deleted_id = 0’ to any queries.
 
This could mean a slow result if sorting by ‘tran_number’ or joining on ‘tran_number’.
 
However, as this particular query joins on ‘tran_type’ (a literal string) and ‘tran_row_id’ (the primary key to the underlying table), I don’t think  it causes a problem.
 
[UPDATE] I actually tried adding ‘deleted_id’ to the VIEW, and including it in the WHERE clause, but the timings did not improve.
 
Frank
 

Re: [GENERAL] a JOIN to a VIEW seems slow

От
"Frank Millman"
Дата:
 
Sent: Friday, September 22, 2017 7:34 AM
Subject: Re: a JOIN to a VIEW seems slow
 
 
On Fri, Sep 22, 2017 at 7:34 AM, Frank Millman wrote:
>  
>  
On Thu, Sep 21, 2017 at 3:54 PM, Merlin Moncure wrote:
>  
> Something is not adding up here.  Can you EXPLAIN ANALYZE the 26 minute
>  
> query?
>  

 
>  
>
Here it is -
>
>
 
Just checking – is this under investigation, or is this thread considered closed?
 
Frank
 
 

Re: [GENERAL] a JOIN to a VIEW seems slow

От
Jan de Visser
Дата:
On Monday, October 2, 2017 2:32:34 AM EDT Frank Millman wrote:
> From: Frank Millman
> Sent: Friday, September 22, 2017 7:34 AM
> To: pgsql-general@postgresql.org
> Subject: Re: a JOIN to a VIEW seems slow
>
> On Fri, Sep 22, 2017 at 7:34 AM, Frank Millman wrote:
> >  On Thu, Sep 21, 2017 at 3:54 PM, Merlin Moncure wrote:
> >  > Something is not adding up here.  Can you EXPLAIN ANALYZE the 26 minute
> >  > query?
> >  >
>  >  > Here it is -
> >
> > https://explain.depesz.com/s/cwm
>
> Just checking – is this under investigation, or is this thread considered
> closed?

That's not how it works. This is a community list; if somebody finds your
problem interesting you will get feedback, but there is no guarantee you will
get any.

One suggestion is to provide as much details as possible (server version info,
table definitions, full SQL statements, explain results, etc). I haven't looked
back in the thread to see if yu actually did this, but that's often the reason
for no/little feedback.

Another thing is to make life easier on the other list members by adhering to
community conventions, i.e. text-only (no HTML) emails, reply trimming, and
bottom posting.




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] a JOIN to a VIEW seems slow

От
Alban Hertroys
Дата:
> On 2 Oct 2017, at 8:32, Frank Millman <frank@chagford.com> wrote:
>
>
> From: Frank Millman
> Sent: Friday, September 22, 2017 7:34 AM
> To: pgsql-general@postgresql.org
> Subject: Re: a JOIN to a VIEW seems slow
>
>
> On Fri, Sep 22, 2017 at 7:34 AM, Frank Millman wrote:
> >
> >   On Thu, Sep 21, 2017 at 3:54 PM, Merlin Moncure wrote:
> >   > Something is not adding up here.  Can you EXPLAIN ANALYZE the 26 minute
> >   > query?
> >
>   >   > Here it is -
> >
>   > https://explain.depesz.com/s/cwm
> >
>
> Just checking – is this under investigation, or is this thread considered closed?
>
> Frank

There are a few problems keeping track of this issue. First of all, above plan does not include the query (I don't know
whetherthat's a thing with depesz's plan analyzer, but ISTR seeing plans _with_ their queries in other cases). That
meanswe have to track back through the thread (quite a bit) to find a query that _may_ be the one that the plan is for.
Addthat to the fact that most of us are busy people, so we have to invest too much time into your problem to be able to
help- and hence we don't. 

The second problem is that the query plan highlights a couple of slow sequential scans on ar_tran_inv and ar_tran_rec,
butthose tables are not in your query from Sep 21st. That makes it impossible for anyone to pinpoint the problem for
you.They're probably in your views somewhere, but we can't see where. 

Looking at that query though, it seems to me that it could help to aggregate the results on cust_row_id in the inner
query(aliased as q) to reduce the result set that the outer query needs to handle. It's possible that the query planner
issmart enough to detect this, I can't tell from the plan, but I wouldn't bet on it. The query plan for that inner
queryshould be interesting as well, especially if moving the aggregation inside does not help. 

Another possible optimisation would be to add a lower bound for tran_date, if such is possible for your case. Currently
youonly have an upper bound: tran_date <= '2015-09-30'. 
Even if there is no data from, say, before 2015-01-01, the query planner does not necessarily know that and may assume
thatmost rows in the table/view will match the upper-bound condition - in which case a sequential scan is probably
fastest.That may be why you don't see Postgres use the index on those columns you mentioned at some point. 

Now, apparently you have an index on columns tran_type and tran_row_id, while tran_row_id is the primary key? In that
caseI'd suggest you drop that index: Any value of tran_row_id will have a single value of tran_type and you're joining
onthe PK already. Meanwhile, the condition on tran_type in the query only serves to complicate the query. 

Oh, BTW, those LEFT JOINs will behave as INNER JOINs, because you use columns from the right-hand side
(alloc_trans.tran_dateand due_trans.tran_date respectively) in your WHERE clauses without allowing them to be NULL. If
youwant those parts to behave like proper LEFT JOINs, either add OR xxx.tran_date IS NULL or move those expressions
intothe JOIN conditions. 

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] a JOIN to a VIEW seems slow

От
"Frank Millman"
Дата:
 
On Wednesday, October 4, 2017 06:07 PM Jan de Visser wrote:
> On Monday, October 2, 2017 2:32:34 AM EDT Frank Millman wrote:
> >
> > Just checking – is this under investigation, or is this thread considered closed?
>
> That's not how it works. This is a community list; if somebody finds your
> problem interesting you will get feedback, but there is no guarantee you will
> get any.
>
 
I appreciate the reply, Jan, but I am actually aware of that.
 
I had posted my query earlier, with full details, and received some responses. I provided additional information as requested, but then the responses dried up. I was just wondering why. If the answer is ‘lack of interest’, I am quite ok with that. I do have a workaround, so I will bash on regardless.
 
Thanks
 
Frank
 

Re: [GENERAL] a JOIN to a VIEW seems slow

От
"Frank Millman"
Дата:

On 4 Oct 2017, at 9:19 PM, Alban Hertroys wrote:

> On 2 Oct 2017, at 8:32, Frank Millman <frank@chagford.com> wrote:
>

> On Fri, Sep 22, 2017 at 7:34 AM, Frank Millman wrote:
> > 
> >   On Thu, Sep 21, 2017 at 3:54 PM, Merlin Moncure wrote:
> >   > Something is not adding up here.  Can you EXPLAIN ANALYZE the 26 minute
> >   > query?
> >  
>   >   > Here it is -
> >
>   > https://explain.depesz.com/s/cwm
> >
>

>
> Just checking – is this under investigation, or is this thread considered closed?
>

>
There are a few problems keeping track of this issue. First of all, above plan does not include the query (I don't know whether that's a thing with depesz's plan analyzer, but ISTR
>
seeing plans _with_ their queries in other cases). That means we have to track back through the thread (quite a bit) to find a query that _may_ be the one that the plan is for. Add
>
that to the fact that most of us are busy people, so we have to invest too much time into your problem to be able to help - and hence we don't.

[snip a lot of good suggestions]
 
Thanks for the reply, Alban.
 
I agree that the query is a bit complex and not easy to analyse. I have taken note of all your suggestions and will investigate them further.
 
I did attempt to narrow this down to a simple example in one of my earlier posts. I could not find a way to provide a link to a single message, but this is the thread -
 
 
and the relevant post is the third one down, dated 21st September.
 
You will have to read the whole message for the details, but the key point was the difference between these two queries -
 
2. =======================================================
 
EXPLAIN SELECT * FROM ar_trans WHERE tran_type = ‘ar_inv’ AND tran_row_id = 1;
 
                                        QUERY PLAN                                       
-------------------------------------------------------------------------------------------
Append  (cost=0.29..8.32 rows=1 width=117)
   ->  Index Scan using ar_tran_inv_pkey on ar_tran_inv  (cost=0.29..8.31 rows=1 width=46)
         Index Cond: (row_id = 1)
         Filter: posted
(4 rows)
 
This is a select against the view ‘ar_trans’. It has worked out that the underlying table to use is ‘ar_tran_inv’, and performed an indexed read.
 
4. =======================================================
 
EXPLAIN SELECT * FROM ar_trans_due a LEFT JOIN ar_trans b ON b.tran_type = a.tran_type AND b.tran_row_id = a.tran_row_id where a.row_id = 1;
 
                                             QUERY PLAN                                            
-----------------------------------------------------------------------------------------------------
Hash Right Join  (cost=8.32..2072.99 rows=1 width=169)
   Hash Cond: ((('ar_inv'::text) = (a.tran_type)::text) AND (ar_tran_inv.row_id = a.tran_row_id))
   ->  Append  (cost=0.00..1739.47 rows=43357 width=117)
         ->  Seq Scan on ar_tran_inv  (cost=0.00..676.01 rows=21601 width=46)
               Filter: posted
         ->  Seq Scan on ar_tran_crn  (cost=0.00..13.88 rows=155 width=124)
               Filter: posted
         ->  Seq Scan on ar_tran_rec  (cost=0.00..616.01 rows=21601 width=40)
               Filter: posted
   ->  Hash  (cost=8.31..8.31 rows=1 width=52)
         ->  Index Scan using ar_trans_due_pkey on ar_trans_due a  (cost=0.29..8.31 rows=1 width=52)
               Index Cond: (row_id = 1)
(12 rows)
 
Here I have set up a join against the view ‘ar_trans’. It seems to have all the information necessary to perform an indexed read, but instead it performs a sequential scan of all three of the underlying tables.
 
If anyone wants to take this further, maybe this is a good place to start.
 
I do have a workaround. It is not pretty – denormalise my data to avoid the need for a join against the view. But it works, so there is no longer any urgency on my part.
 
Thanks
 
Frank
 

Re: [GENERAL] a JOIN to a VIEW seems slow

От
"Frank Millman"
Дата:

On 5 Oct 2017, at 8:20 AM, Frank Millman wrote:
> If anyone wants to take this further, maybe this is a good place to start.
 
I should have re-stated the reason for my original post.
 
Exactly the same query, on exactly the same data, takes 1.8 seconds on Sql Server, 1.0 seconds on SQLite3, and 1607 seconds, or 26 minutes, on PostgreSQL 9.4.4.
 
Frank
 

Re: [GENERAL] a JOIN to a VIEW seems slow

От
"Frank Millman"
Дата:
On 5 Oct 2017, at 9:51 AM, Frank Millman wrote:
>
> I should have re-stated the reason for my original post.
>  
> Exactly the same query, on exactly the same data, takes 1.8 seconds on Sql Server, 1.0 seconds on SQLite3, and 1607 seconds, or 26 minutes, on PostgreSQL 9.4.4.
>  
 
I will give this another shot. I have made this as simple as I can. Just as a reminder, ‘ar_trans’ is a VIEW.
 
1. Simple select from ‘artrans_due’
    SELECT *
        FROM ar_trans_due
 
    Sql Server: 0.56 sec; PostgreSQL 0.41 sec
 
2. Select from ‘ar_trans_due’ including join to ‘ar_trans’
    SELECT *
        FROM ar_trans_due a
        LEFT JOIN ar_trans b
           ON b.tran_type = a.tran_type
           AND b.tran_row_id = a.tran_row_id
 
    Sql Server: 0.90 sec; PostgreSQL 0.70 sec
 
3. Select from ar_trans_due including join to ar_trans,
        plus sub-select from ar_trans_alloc
    SELECT *,
        (SELECT SUM(c.alloc_cust)
            FROM ar_trans_alloc c
            WHERE c.due_row_id = a.row_id)
        FROM ar_trans_due a
        LEFT JOIN ar_trans b
           ON b.tran_type = a.tran_type
           AND b.tran_row_id = a.tran_row_id
 
    Sql Server: 0.92 sec; PostgreSQL 1.00 sec
 
4. Select from ar_trans_due including join to ar_trans,
        plus sub_select from ar_trans_alloc including join to ar_trans
    SELECT *,
        (SELECT SUM(c.alloc_cust)
            FROM ar_trans_alloc c
            LEFT JOIN ar_trans d
               ON d.tran_type = c.tran_type
               AND d.tran_row_id = c.tran_row_id
            WHERE c.due_row_id = a.row_id)
        FROM ar_trans_due a
        LEFT JOIN ar_trans b
           ON b.tran_type = a.tran_type
           AND b.tran_row_id = a.tran_row_id
 
    Sql Server: 1.01 sec; PostgreSQL 1683 sec
 
As you can see, it is the join inside the sub-select that kills it.
 
Someone has kindly tested this for me on version 9.6.5 and on version 10, and the results are similar.
 
Here is the EXPLAIN ANALYSE for the last of the above queries -
 
 
Frank Millman
 

Re: [GENERAL] a JOIN to a VIEW seems slow

От
David Rowley
Дата:
On 7 October 2017 at 22:34, Frank Millman <frank@chagford.com> wrote:
> 4. Select from ar_trans_due including join to ar_trans,
>         plus sub_select from ar_trans_alloc including join to ar_trans
>     SELECT *,
>         (SELECT SUM(c.alloc_cust)
>             FROM ar_trans_alloc c
>             LEFT JOIN ar_trans d
>                ON d.tran_type = c.tran_type
>                AND d.tran_row_id = c.tran_row_id
>             WHERE c.due_row_id = a.row_id)
>         FROM ar_trans_due a
>         LEFT JOIN ar_trans b
>            ON b.tran_type = a.tran_type
>            AND b.tran_row_id = a.tran_row_id
>
>     Sql Server: 1.01 sec; PostgreSQL 1683 sec

Yeah, PostgreSQL does not make any effort to convert subqueries in the
target list into joins. SQL server does.

The way you have written the query might be good if there are not so
many rows in the outer part of the query, however, as the number of
rows increases then performance will get worse pretty quickly.

You'll probably find it'll run faster if you convert the subquery in
the target list into a join with a GROUP BY, like:

SELECT a.*,b.*,c.sum_alloc_cust
FROM ar_trans_due a
LEFT JOIN ar_trans b   ON b.tran_type = a.tran_type   AND b.tran_row_id = a.tran_row_id
LEFT JOIN (SELECT c.due_row_id,   SUM(c.alloc_cust) AS sum_alloc_cust   FROM ar_trans_alloc c   LEFT JOIN ar_trans d
  ON d.tran_type = c.tran_type       AND d.tran_row_id = c.tran_row_id   GROUP BY c.due_row_id
 
) c ON c.due_row_id = a.row_id;

SQL Server will probably be doing this rewrite.

-- David Rowley                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] a JOIN to a VIEW seems slow

От
"Frank Millman"
Дата:
 
On 7 October 2017 at 2:49 PM, David Rowley wrote:
>
>
Yeah, PostgreSQL does not make any effort to convert subqueries in the
>
target list into joins. SQL server does.
[...]
>
You'll probably find it'll run faster if you convert the subquery in
>
the target list into a join with a GROUP BY, like ...
>
 
Wow, David, that is perfect – 0.91 seconds. Problem well and truly solved.
 
It took me a little while to modify my original query to use that concept, but now it flies.
 
Many thanks
 
Frank