Re: [SQL] pqReadData() error

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: [SQL] pqReadData() error
Дата
Msg-id 18704.944496086@sss.pgh.pa.us
обсуждение исходный текст
Ответ на pqReadData() error  ("Imtiaz. S. M" <imtiaz_sm@yahoo.com>)
Список pgsql-sql
"Imtiaz. S. M" <imtiaz_sm@yahoo.com> writes:
> ... to delete the oldest data I use the following SQL queries.

> create table tempdates as select distinct dates from daydata;
> delete from daydata where dates in (select b.dates from tempdates a,
> tempdates b where a.dates-b.dates > '22 hours');

This is certainly the hard, hard way to do it.  Why not

create table tempdates as   select max(dates) - '22 hours'::interval as cutoff from daydata;
delete from daydata where dates < (select cutoff from tempdates);

As you have it, the subselect must generate O(N^2) rows for an N-row
daydata table (assuming the distribution of dates is fairly even).
If I read you correctly, there are about 12,000 rows in daydata at all
times?  If so the subselect will examine 144,000,000 pairs of a.dates
and b.dates, and probably output about 25,000 rows (500 a rows matching
each b row, and vice versa).

That's a lot of data, and then it has to be scanned over for each row of
the daydata table in order to implement the IN; the rows that you are
keeping will require a comparison against *every single one* of the
subselect output rows to verify that the IN fails.  So about 11,500 *
25,000 row comparisons done by the IN just for the non-deleted rows,
and probably a few million more for the deletable rows.

Lots faster to calculate the cutoff time once and then do one compare
for each row.

You could probably simplify this even more, to

delete from daydata
where dates < (select max(dates) - '22 hours'::interval from daydata);

unless you have some other use for the temp table's contents.

> The query takes a long time and then gives me the
> following error

> "pqReadData() -- backend closed the channel unexpectedly.

Probably all that date arithmetic is overflowing memory with temporary
results ... which won't get reclaimed till end of query, at present.
But even if it didn't run out of memory, doing it this way is very slow.
        regards, tom lane


В списке pgsql-sql по дате отправления:

Предыдущее
От: cristiv
Дата:
Сообщение: (no subject)
Следующее
От: Tom Lane
Дата:
Сообщение: Re: [SQL] How to avoid "Out of memory" using aggregate functions?