On Tue, Apr 4, 2017 at 8:20 AM, Daniel Westermann
<daniel.westermann@dbi-services.com> wrote:
> Hi,
>
> PostgreSQL 9.6.2 on CentOS 7.3 x64.
>
> This is my data set:
>
> drop table if exists users;
> drop table if exists ids;
> create table users ( user_id int
> , username varchar(50)
> );
> with generator as
> ( select a.*
> from generate_series (1,3000000) a
> order by random()
> )
> insert into users ( user_id
> , username
> )
> select a
> , md5(a::varchar)
> from generator;
> create unique index i_users on users ( user_id );
> create table ids ( id int );
> insert into ids (id) values ( generate_series ( 2500000, 3500000 ) );
> create unique index i_ids on ids ( id );
> analyze users;
> analyze ids;
>
> I have set work_mem to a very low value intentionally for demonstration
> purposes:
>
> postgres=# show work_mem;
> work_mem
> ----------
> 16MB
> (1 row)
>
> postgres=# show shared_buffers ;
> shared_buffers
> ----------------
> 128MB
> (1 row)
>
>
> When I run the following query ( I know that "not in" is not a good choice
> here ):
>
> postgres=# select count(user_id) from users where user_id not in ( select id
> from ids);
"NOT IN" where the predate is a table column can lead to very poor
query plans especially where the haystack is not provably known (at
plan time) to contain only not null values. By reducing work_mem, the
server has decided has to repeatedly search the table to search for
the presence of null values. Try converting the query to NOT EXISTS.
merlin