Обсуждение: [GENERAL] Row value expression much faster than equivalent OR clauses

Поиск

Список

Период

Сортировка

[GENERAL] Row value expression much faster than equivalent OR clauses

От

Steven Grimm

Дата:

28 декабря 2016 г., 19:58:24

A library my application is using does a "scan a batch at a time" loop over a table of events, keeping track of its last position so it can start the next query in the right place.

SELECT eventIdentifier, aggregateIdentifier, sequenceNumber, timeStamp, payloadType,
    payloadRevision, payload, metaData
FROM DomainEventEntry e
WHERE ((e.timeStamp > '2016-12-19T20:34:22.315Z')
        OR (e.timeStamp = '2016-12-19T20:34:22.315Z'
            AND e.sequenceNumber > 0)
        OR (e.timeStamp = '2016-12-19T20:34:22.315Z'
            AND e.sequenceNumber = 0
            AND e.aggregateIdentifier > 'dev:642e1953-2562-4768-80d9-0c3af9b0ff84'))
AND (type = 'transAggPrototype')
ORDER BY e.timeStamp ASC, e.sequenceNumber ASC, e.aggregateIdentifier ASC
LIMIT 100;

This uses the index on the three columns it's using for ordering of events, but (if I'm reading the explain output correctly) does a full scan of the index.

Limit (cost=0.55..1349.44 rows=100 width=576) (actual time=526.814..527.238 rows=100 loops=1)
   -> Index Scan using domainevententry_ts_seq_agg on domainevententry e (cost=0.55..92494.44 rows=6857 width=576) (actual time=526.811..527.035 rows=100 loops=1)
         Filter: (((type)::text = 'transAggPrototype'::text) AND ((("timestamp")::text > '2016-12-19T20:34:22.315Z'::text) OR ((("timestamp")::text = '2016-12-19T20:34:22.315Z'::text) AND (sequencenumber > 0)) OR ((("timestamp")::text = '2016-12-19T20:34:22.315Z'::text) AND (sequencenumber = 0) AND ((aggregateidentifier)::text > 'dev:642e1953-2562-4768-80d9-0c3af9b0ff84'::text))))
         Rows Removed by Filter: 332183
Planning time: 1.893 ms
Execution time: 527.368 ms

I played around with it a little and one thing I tried was to restructure the WHERE clause using a row value expression that's semantically equivalent to the original.

SELECT eventIdentifier, aggregateIdentifier, sequenceNumber, timeStamp, payloadType,
    payloadRevision, payload, metaData
FROM DomainEventEntry e
WHERE (e.timeStamp, e.sequenceNumber, e.aggregateIdentifier) >
        ('2016-11-19T20:34:22.315Z', 0, 'dev:642e1953-2562-4768-80d9-0c3af9b0ff84')
AND (type = 'transAggPrototype')
ORDER BY e.timeStamp ASC, e.sequenceNumber ASC, e.aggregateIdentifier ASC
LIMIT 100;

This ends up being a LOT faster:

Limit (cost=0.55..56.81 rows=100 width=576) (actual time=0.065..0.667 rows=100 loops=1)
   -> Index Scan using domainevententry_ts_seq_agg on domainevententry e (cost=0.55..65581.93 rows=116573 width=576) (actual time=0.062..0.437 rows=100 loops=1)
         Index Cond: (ROW(("timestamp")::text, sequencenumber, (aggregateidentifier)::text) > ROW('2016-11-19T20:34:22.315Z'::text, 0, 'dev:642e1953-2562-4768-80d9-0c3af9b0ff84'::text))
         Filter: ((type)::text = 'transAggPrototype'::text)
         Rows Removed by Filter: 235
Planning time: 1.705 ms
Execution time: 0.795 ms

I wonder if the query planner could recognize that the two queries are equivalent and choose the second plan for the OR-clause version, or at least use the index more efficiently. This is on PostgreSQL 9.5.2.

-Steve

Re: [GENERAL] Row value expression much faster than equivalent OR clauses

От

Kevin Grittner

Дата:

04 января 2017 г., 17:49:06

On Wed, Dec 28, 2016 at 10:58 AM, Steven Grimm <sgrimm@thesegovia.com> wrote:

> WHERE ((e.timeStamp > '2016-12-19T20:34:22.315Z')
>         OR (e.timeStamp = '2016-12-19T20:34:22.315Z'
>             AND e.sequenceNumber > 0)
>         OR (e.timeStamp = '2016-12-19T20:34:22.315Z'
>             AND e.sequenceNumber = 0
>             AND e.aggregateIdentifier >
> 'dev:642e1953-2562-4768-80d9-0c3af9b0ff84'))

> This uses the index on the three columns it's using for ordering of events,
> but (if I'm reading the explain output correctly) does a full scan of the
> index.

> I played around with it a little and one thing I tried was to restructure
> the WHERE clause using a row value expression that's semantically equivalent
> to the original.

> WHERE (e.timeStamp, e.sequenceNumber, e.aggregateIdentifier) >
>         ('2016-11-19T20:34:22.315Z', 0, 'dev:642e1953-2562-4768-80d9-0c3af9b0ff84')

> This ends up being a LOT faster:

Yup.

> I wonder if the query planner could recognize that the two queries are
> equivalent and choose the second plan for the OR-clause version, or at least
> use the index more efficiently.

Theoretically it could, but that would add significant time to
planning for a large number of queries, with no benefit to those
who explicitly write the query in the faster (and more concise!)
fashion.

You could come a lot closer to the performance of the row value
expression technique by using the logical equivalent of your
original query that puts AND at the higher level and OR at the
lower level.  (Having OR at the top is generally inefficient.)

WHERE (e.timeStamp >= '2016-12-19T20:34:22.315Z'
  AND (e.timeStamp >  '2016-12-19T20:34:22.315Z'
     OR (e.sequenceNumber >= 0
    AND (e.sequenceNumber >  0
       OR (e.aggregateIdentifier >
'dev:642e1953-2562-4768-80d9-0c3af9b0ff84')))))

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [GENERAL] Row value expression much faster than equivalent OR clauses

От

Merlin Moncure

Дата:

05 января 2017 г., 22:14:41

On Wed, Jan 4, 2017 at 8:49 AM, Kevin Grittner <kgrittn@gmail.com> wrote:
> On Wed, Dec 28, 2016 at 10:58 AM, Steven Grimm <sgrimm@thesegovia.com> wrote:
>
>> WHERE ((e.timeStamp > '2016-12-19T20:34:22.315Z')
>>         OR (e.timeStamp = '2016-12-19T20:34:22.315Z'
>>             AND e.sequenceNumber > 0)
>>         OR (e.timeStamp = '2016-12-19T20:34:22.315Z'
>>             AND e.sequenceNumber = 0
>>             AND e.aggregateIdentifier >
>> 'dev:642e1953-2562-4768-80d9-0c3af9b0ff84'))
>
>> This uses the index on the three columns it's using for ordering of events,
>> but (if I'm reading the explain output correctly) does a full scan of the
>> index.
>
>> I played around with it a little and one thing I tried was to restructure
>> the WHERE clause using a row value expression that's semantically equivalent
>> to the original.
>
>> WHERE (e.timeStamp, e.sequenceNumber, e.aggregateIdentifier) >
>>         ('2016-11-19T20:34:22.315Z', 0, 'dev:642e1953-2562-4768-80d9-0c3af9b0ff84')
>
>> This ends up being a LOT faster:
>
> Yup.
>
>> I wonder if the query planner could recognize that the two queries are
>> equivalent and choose the second plan for the OR-clause version, or at least
>> use the index more efficiently.
>
> Theoretically it could, but that would add significant time to
> planning for a large number of queries, with no benefit to those
> who explicitly write the query in the faster (and more concise!)
> fashion.
>
> You could come a lot closer to the performance of the row value
> expression technique by using the logical equivalent of your
> original query that puts AND at the higher level and OR at the
> lower level.  (Having OR at the top is generally inefficient.)
>
> WHERE (e.timeStamp >= '2016-12-19T20:34:22.315Z'
>   AND (e.timeStamp >  '2016-12-19T20:34:22.315Z'
>      OR (e.sequenceNumber >= 0
>     AND (e.sequenceNumber >  0
>        OR (e.aggregateIdentifier >
> 'dev:642e1953-2562-4768-80d9-0c3af9b0ff84')))))

In practice this can utilize the first part of the key only.  So
performance will be good if "timeStamp" is selective and poor if it
isn't.   In this query, I'd venture to guess it'd be pretty good.  The
row-wise comparison feature was built specifically to handle this type
of query and it's additionally much more concise as you noted, so I'd
agree; effort is better spent on client side education than on complex
rewriting strategies.

This type of query comes up a lot in applications that were converted from ISAM.

merlin

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: [GENERAL] Row value expression much faster than equivalent OR clauses

[GENERAL] Row value expression much faster than equivalent OR clauses

Re: [GENERAL] Row value expression much faster than equivalent OR clauses

Re: [GENERAL] Row value expression much faster than equivalent OR clauses