Обсуждение: I don't understand something...

Поиск
Список
Период
Сортировка

I don't understand something...

От
Alexander Pyhalov
Дата:
Hello.
I was asked a simple question. We have table employees:
  \d employees
                                             Table "public.employees"
      Column     |            Type             |
     Modifiers
----------------+-----------------------------+-----------------------------------------------------------------
  employee_id    | integer                     | not null default
nextval('employees_employee_id_seq'::regclass)
  first_name     | character varying(20)       |
  last_name      | character varying(25)       | not null
  email          | character varying(25)       | not null
  phone_number   | character varying(20)       |
  hire_date      | timestamp without time zone | not null
  job_id         | character varying(10)       | not null
  salary         | numeric(8,2)                |
  commission_pct | numeric(2,2)                |
  manager_id     | integer                     |
  department_id  | integer                     |
Indexes:
     "employees_pkey" PRIMARY KEY, btree (employee_id)
     "emp_email_uk" UNIQUE, btree (email)
     "emp_department_ix" btree (department_id)
     "emp_job_ix" btree (job_id)
     "emp_manager_ix" btree (manager_id)
     "emp_name_ix" btree (last_name, first_name)
Check constraints:
     "emp_salary_min" CHECK (salary > 0::numeric)
Foreign-key constraints:
     "employees_department_id_fkey" FOREIGN KEY (department_id)
REFERENCES departments(department_id)
     "employees_job_id_fkey" FOREIGN KEY (job_id) REFERENCES jobs(job_id)
     "employees_manager_id_fkey" FOREIGN KEY (manager_id) REFERENCES
employees(employee_id)
Referenced by:
     TABLE "departments" CONSTRAINT "dept_mgr_fk" FOREIGN KEY
(manager_id) REFERENCES employees(employee_id)
     TABLE "employees" CONSTRAINT "employees_manager_id_fkey" FOREIGN
KEY (manager_id) REFERENCES employees(employee_id)
     TABLE "job_history" CONSTRAINT "job_history_employee_id_fkey"
FOREIGN KEY (employee_id) REFERENCES employees(employee_id)

Now we want to select count of all employees who doesn't have any
subordinates (query 1):
  SELECT count(employee_id) from employees o where not exists  (select 1
from employees  where manager_id=o.employee_id);
  count
-------
     89
(1 row)

We can select count of all managers (query 2):
SELECT count(employee_id) from employees where employee_id  in (select
manager_id from employees);
  count
-------
     18
(1 row)

But if we reformulate the first query in the same way, answer is
different (query 3):
SELECT count(employee_id) from employees where employee_id not in
(select manager_id from employees) (query 3);
  count
-------
      0
(1 row)

I don't understand why queries 1 and 3 give different results. They
seems to be the same... Could someone explain the difference?

--
Best regards,
Alexander Pyhalov,
system administrator of Computer Center of Southern Federal University

Re: I don't understand something...

От
Achilleas Mantzios
Дата:
Alexander, that's a classic one,
rewrite your last query as :

SELECT count(employee_id) from employees where employee_id not in
    (select manager_id from employees WHERE manager_id  IS NOT NULL);

NULLS semantics are sometimes not so obvious.

Στις Monday 03 October 2011 09:33:12 ο/η Alexander Pyhalov έγραψε:
> Hello.
> I was asked a simple question. We have table employees:
>   \d employees
>                                              Table "public.employees"
>       Column     |            Type             |
>      Modifiers
> ----------------+-----------------------------+-----------------------------------------------------------------
>   employee_id    | integer                     | not null default
> nextval('employees_employee_id_seq'::regclass)
>   first_name     | character varying(20)       |
>   last_name      | character varying(25)       | not null
>   email          | character varying(25)       | not null
>   phone_number   | character varying(20)       |
>   hire_date      | timestamp without time zone | not null
>   job_id         | character varying(10)       | not null
>   salary         | numeric(8,2)                |
>   commission_pct | numeric(2,2)                |
>   manager_id     | integer                     |
>   department_id  | integer                     |
> Indexes:
>      "employees_pkey" PRIMARY KEY, btree (employee_id)
>      "emp_email_uk" UNIQUE, btree (email)
>      "emp_department_ix" btree (department_id)
>      "emp_job_ix" btree (job_id)
>      "emp_manager_ix" btree (manager_id)
>      "emp_name_ix" btree (last_name, first_name)
> Check constraints:
>      "emp_salary_min" CHECK (salary > 0::numeric)
> Foreign-key constraints:
>      "employees_department_id_fkey" FOREIGN KEY (department_id)
> REFERENCES departments(department_id)
>      "employees_job_id_fkey" FOREIGN KEY (job_id) REFERENCES jobs(job_id)
>      "employees_manager_id_fkey" FOREIGN KEY (manager_id) REFERENCES
> employees(employee_id)
> Referenced by:
>      TABLE "departments" CONSTRAINT "dept_mgr_fk" FOREIGN KEY
> (manager_id) REFERENCES employees(employee_id)
>      TABLE "employees" CONSTRAINT "employees_manager_id_fkey" FOREIGN
> KEY (manager_id) REFERENCES employees(employee_id)
>      TABLE "job_history" CONSTRAINT "job_history_employee_id_fkey"
> FOREIGN KEY (employee_id) REFERENCES employees(employee_id)
>
> Now we want to select count of all employees who doesn't have any
> subordinates (query 1):
>   SELECT count(employee_id) from employees o where not exists  (select 1
> from employees  where manager_id=o.employee_id);
>   count
> -------
>      89
> (1 row)
>
> We can select count of all managers (query 2):
> SELECT count(employee_id) from employees where employee_id  in (select
> manager_id from employees);
>   count
> -------
>      18
> (1 row)
>
> But if we reformulate the first query in the same way, answer is
> different (query 3):
> SELECT count(employee_id) from employees where employee_id not in
> (select manager_id from employees) (query 3);
>   count
> -------
>       0
> (1 row)
>
> I don't understand why queries 1 and 3 give different results. They
> seems to be the same... Could someone explain the difference?
>
> --
> Best regards,
> Alexander Pyhalov,
> system administrator of Computer Center of Southern Federal University
>



--
Achilleas Mantzios

Re: I don't understand something...

От
Alexander Pyhalov
Дата:
On 10/03/2011 11:19, Achilleas Mantzios wrote:
> Alexander, that's a classic one,
> rewrite your last query as :
>
> SELECT count(employee_id) from employees where employee_id not in
>     (select manager_id from employees WHERE manager_id  IS NOT NULL);
>
> NULLS semantics are sometimes not so obvious.

Thanks.
It's confusing, that IN check works as expected, but NOT IN works this
way...

--
Best regards,
Alexander Pyhalov,
system administrator of Computer Center of Southern Federal University

Re: I don't understand something...

От
hubert depesz lubaczewski
Дата:
On Mon, Oct 03, 2011 at 11:48:45AM +0400, Alexander Pyhalov wrote:
> On 10/03/2011 11:19, Achilleas Mantzios wrote:
> >Alexander, that's a classic one,
> >rewrite your last query as :
> >
> >SELECT count(employee_id) from employees where employee_id not in
> >    (select manager_id from employees WHERE manager_id  IS NOT NULL);
> >
> >NULLS semantics are sometimes not so obvious.
>
> Thanks.
> It's confusing, that IN check works as expected, but NOT IN works
> this way...

If I might plug:
http://www.depesz.com/index.php/2008/08/13/nulls-vs-not-in/

Best regards,

depesz

--
The best thing about modern society is how easy it is to avoid contact with it.
                                                             http://depesz.com/

Re: I don't understand something...

От
Alban Hertroys
Дата:
On 3 October 2011 08:33, Alexander Pyhalov <alp@rsu.ru> wrote:
> Now we want to select count of all employees who doesn't have any
> subordinates (query 1):
>  SELECT count(employee_id) from employees o where not exists  (select 1 from
> employees  where manager_id=o.employee_id);
>  count
> -------
>    89
> (1 row)
>
> We can select count of all managers (query 2):
> SELECT count(employee_id) from employees where employee_id  in (select
> manager_id from employees);
>  count
> -------
>    18
> (1 row)
>
> But if we reformulate the first query in the same way, answer is different
> (query 3):
> SELECT count(employee_id) from employees where employee_id not in (select
> manager_id from employees) (query 3);
>  count
> -------
>     0
> (1 row)
>
> I don't understand why queries 1 and 3 give different results. They seems to
> be the same... Could someone explain the difference?

That's because NOT IN returns NULL if there are any NULLs in the list.
As the WHERE-clause requires something to evaluate to either true or
false (NULL won't do), you (correctly) get false if someone is a
manager, but also if _anyone_ is NOT a manager.

That's an artefact of how 3-valued logic is implemented in the SQL standard.

--
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.