Re: Limitting full join to one match

Поиск
Список
Период
Сортировка
От Sergei Agalakov
Тема Re: Limitting full join to one match
Дата
Msg-id 0a683148-5330-bd53-c3dc-7fb953e73874@gmail.com
обсуждение исходный текст
Список pgsql-hackers
On 12/5/2018 8:30 PM, John W Higgins wrote:


On Wed, Dec 5, 2018 at 4:34 PM Phil Endecott <spam_from_pgsql_lists@chezphil.org> wrote:
Dear Experts,

I have a couple of tables that I want to reconcile, finding rows
that match and places where rows are missing from one table or the
other:

...
 
So my question is: how can I modify my query to output only two rows,
like this:?

+------------+--------+------------+--------+
|    date    | amount |    date    | amount |
+------------+--------+------------+--------+
| 2018-01-01 |  10.00 | 2018-01-01 |  10.00 |
| 2018-02-01 |   5.00 |            |        |
|            |        | 2018-03-01 |   8.00 |
| 2018-04-01 |   5.00 | 2018-04-01 |   5.00 |
| 2018-05-01 |  20.00 | 2018-05-01 |  20.00 |  1
| 2018-05-01 |  20.00 | 2018-05-01 |  20.00 |  2
+------------+--------+------------+--------+


Evening Phil,

Window functions are your friend here. I prefer views for this stuff - but subqueries would work just fine.

create view a_rows as (select *, 
                       row_number() OVER (PARTITION BY date, amount) AS pos from a);
create view b_rows as (select *, 
                       row_number() OVER (PARTITION BY date, amount) AS pos from b);

select 
  a_rows.date, 
  a_rows.amount, 
  a_rows.pos,
  b_rows.date, 
  b_rows.amount,
  b_rows.pos
from 
  a_rows full join b_rows using (date,amount,pos);


John 

Any suggestions anyone?


The best I have found so far is something involving EXCEPT ALL:

db=> select * from a except all select * from b;
db=> select * from b except all select * from a;

That's not ideal, though, as what I ultimately want is something
that lists everything with its status:

+------------+--------+--------+
|    date    | amount | status |
+------------+--------+--------+
| 2018-01-01 |  10.00 |   OK   |
| 2018-02-01 |   5.00 | a_only |
| 2018-03-01 |   8.00 | b_only |
| 2018-04-01 |   5.00 |   OK   |
| 2018-05-01 |  20.00 |   OK   |
| 2018-05-01 |  20.00 |   OK   |
+------------+--------+--------+

That would be easy enough to achieve from the JOIN.


Thanks, Phil.


This question is always asked time to time.
I have found an old article with so far the best solution for big tables.
https://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:2151582681236#15393095283923

On the same test data
create table a (date date, amount money);
create table b (date date, amount money);

insert into a values ('2018-01-01', 10);
insert into a values ('2018-02-01', 5);
insert into a values ('2018-04-01', 5);
insert into a values ('2018-05-01', 20);
insert into a values ('2018-05-01', 20);
insert into b values ('2018-01-01', 10);
insert into b values ('2018-03-01', 8);
insert into b values ('2018-04-01', 5);
insert into b values ('2018-05-01', 20);
insert into b values ('2018-05-01', 20);

select tt.date,
       tt.amount,
       count(tt.src1) CNT1,
       count(tt.src2) CNT2
 from
(
 select a.date,
        a.amount,
        1 src1,
        null::integer src2
   from a
 union all
 select b.date,
        b.amount, 
        null::integer src1,
        2 src2
  from b
  ) tt
group by tt.date, tt.amount;

date          amount    cnt1    cnt2
2018-01-01    $10.00     1       1
2018-02-01    $5.00      1       0
2018-03-01    $8.00      0       1
2018-04-01    $5.00      1       1
2018-05-01    $20.00     2       2

It requires a sort, so you may want to increase work_mem before execution, and then return it back like
SET work_mem = '512MB';
...  run your query
RESET work_mem;

Regards,

Sergei Agalakov

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: Use durable_unlink for .ready and .done files for WAL segmentremoval
Следующее
От: "Takahashi, Ryohei"
Дата:
Сообщение: RE: Too many logs are written on Windows (LOG: could not reserveshared memory region (addr=%p) for child %p:)