Re: Gsoc2012 Idea --- Social Network database schema

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: Gsoc2012 Idea --- Social Network database schema
Дата
Msg-id CA+TgmoZf__+sPnjO8bFbsSR_YV8Tt7ZBZ-HQeP9bQdp2P6ogGQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Gsoc2012 Idea --- Social Network database schema  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Gsoc2012 Idea --- Social Network database schema
Re: Gsoc2012 Idea --- Social Network database schema
Список pgsql-hackers
On Wed, Mar 21, 2012 at 11:34 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> Well, the standard syntax apparently aims to reduce the number of
>> returned rows, which ORDER BY does not.  Maybe you could do it with
>> ORDER BY .. LIMIT, but the idea here I think is that we'd like to
>> sample the table without reading all of it first, so that seems to
>> miss the point.
>
> I think actually the traditional locution is more like
>        WHERE random() < constant
> where the constant is the fraction of the table you want.  And yeah,
> the presumption is that you'd like it to not actually read every row.
> (Though unless the sampling density is quite a bit less than 1 row
> per page, it's not clear how much you're really going to win.)

Well, there's something mighty tempting about having a way to say
"just give me a random sample of the blocks and I'll worry about
whether that represents a random sample of the rows".

It's occurred to me a few times that it's pretty unfortunate you can't
do that with a TID condition.

rhaas=# explain select * from randomtext where ctid >= '(500,1)' and
ctid < '(501,1)';                            QUERY PLAN
--------------------------------------------------------------------Seq Scan on randomtext  (cost=0.00..111764.90
rows=25000width=31)  Filter: ((ctid >= '(500,1)'::tid) AND (ctid < '(501,1)'::tid)) 
(2 rows)

The last time this came up for me was when I was trying to find which
row in a large table as making the SELECT blow up; but it seems like
it could be used to implement a poor man's sampling method, too... it
would be nicer, in either case, to be able to specify the block
numbers you'd like to be able to read, rather than bounding the CTID
from both ends as in the above example.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Heikki Linnakangas
Дата:
Сообщение: Re: Bug: walsender and high CPU usage
Следующее
От: Robert Haas
Дата:
Сообщение: Re: Command Triggers