So I have a 'accounts' table, with id and name, and than some
hypothetical 'packages' table, containing some info per customer.
I need to retrive distinct pairs , of random packages assigned per customer.
Packages table contains 10 packages, id:=[1:10], there's 1M customers
for testing purposes.
I could name the tables foo/bar again, but decided for something more
creative this time ;)
Anyways, I have this query:
select count(distinct (v,id)) from (
select heh.id, v[i] from
(
SELECT ss.id, ARRAY
(
SELECT id FROM packages where ss.id>0 and id between 2 and 6
ORDER BY random() limit 5
) as v FROM
(
SELECT id FROM accounts ORDER BY random() limit 100000
) ss
) heh,generate_series(1, 5 ) i order by heh.id,v
) ziew;
So in theory, that should return me random array of packages, per
account. Since id's in both tables are primary keys, I expect the pair
of accountId/packageId to be unique as well.
The query above doesn't deliver, so I tried to divide it up:
create view hehview as SELECT ss.id, ARRAY
(
SELECT id FROM packages where ss.id>0 and id between 2 and 6
ORDER BY random() limit 5
) as v FROM
(
SELECT id FROM accounts ORDER BY random() limit 100000
) ss
select count( distinct (id, v[i])) from hehview, generate_series(1, 5) i;
That doesn't work either, because postgresql 'merges' view into query
(which is a good way to chop large queries btw, and still keep them up
to speed).
But if I store intermediate result in temporary table, all values are
nicely unique - as I want them.
Now, that's the solution I will use. But for sake of my conciousness,
I want to know what has failed here.
Btw, the count(distinct(x,y)) works that way only on 8.4, but I tested
it on 8.3, and I get same results.
with temp table:
create temp table hehtable as select * from hehview;
select count( distinct (id, v[i])) from hehtable, generate_series(1, 5) i;
Thanks folks.
--
GJ