Suboptimal query plan when using expensive BCRYPT functions

Поиск
Список
Период
Сортировка
От Erik van Zijst
Тема Suboptimal query plan when using expensive BCRYPT functions
Дата
Msg-id CA+69USsLsVb8U5=r7KYs9GYT4Rx9DtyaohsfHvLzVKS=fZNUEw@mail.gmail.com
обсуждение исходный текст
Ответы Re: Suboptimal query plan when using expensive BCRYPT functions  (bricklen <bricklen@gmail.com>)
Re: Suboptimal query plan when using expensive BCRYPT functions  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Список pgsql-performance
Hi there,

I've got a relatively simple query that contains expensive BCRYPT
functions that gets optimized in a way that causes postgres to compute
more bcrypt hashes than necessary, thereby dramatically slowing things
down.

In a certain part of our application we need to lookup users by their
username, email address and password. Now we don't store plaintext
passwords and so the query needs to compute bcrypt hashes on the fly:

    SELECT DISTINCT u.*
    FROM auth_user u
    JOIN bb_userprofile p ON p.user_id = u.id
    JOIN bb_identity i ON i.profile_id = p.id
    WHERE
    (
      (
        u.username ILIKE 'detkin'
        OR
        i.email ILIKE 'foo@example.com'
      )
      AND
      (
        SUBSTRING(password FROM 8) = CRYPT(
          'detkin', SUBSTRING(password FROM 8))
      )
    )

These queries are generated by a parser that translates from an
external query language to SQL run on the database. This test db
contains 12 user records.

With a single bcrypt hash taking ~300ms to compute, this is a recipe
for disaster and so the app only allows queries that require only a
very small number of bcrypt computation.

E.g. the user must always "AND" the password lookup with a clause like
" username = 'foo' AND password = 'bar'" (username is unique).

However, while the query above technically only needs to compute 1
hash (there is a user 'detkin' and email 'foo@example.com' does not
exist), it instead creates a query plan that computes hashes *before*
filtering on username and email, leading to 12 hash computations and a
very slow query.

The EXPLAIN (ANALYZE, BUFFERS) is here: http://explain.depesz.com/s/yhE

The schemas for the 3 tables involved are here:
http://pgsql.privatepaste.com/f72020ad0a

As a quick experiment I tried moving the joins and email lookup into a
nested IN query, but that still generates a plan that computes hashes
for all 12 users, before picking out the 1 whose username matches.

Is there any way I can get postgres to perform the hash calculations
on the *result* of the other parts of the where clause, instead of the
other way around? Or else rewrite the query?

Cheers,
Erik


В списке pgsql-performance по дате отправления:

Предыдущее
От: Shaun Thomas
Дата:
Сообщение: Re: Getting query plan alternatives from query planner?
Следующее
От: bricklen
Дата:
Сообщение: Re: Suboptimal query plan when using expensive BCRYPT functions