Re: EXTERNAL storage and substring on long strings

Поиск
Список
Период
Сортировка
От Joe Conway
Тема Re: EXTERNAL storage and substring on long strings
Дата
Msg-id 3F316A41.6020403@joeconway.com
обсуждение исходный текст
Ответ на Re: EXTERNAL storage and substring on long strings  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: EXTERNAL storage and substring on long strings
Список pgsql-performance
Tom Lane wrote:
> Scott Cain <cain@cshl.org> writes:
>
>>A few days ago, I asked for advice on speeding up substring queries on
>>the GENERAL mailing list.  Joe Conway helpfully pointed out the ALTER
>>TABLE STORAGE EXTERNAL documentation.  After doing the alter,
>>the queries got slower!  Here is the background:
>
> Ah-hah, I've sussed it ... you didn't actually change the storage
> representation.  You wrote:

Yeah, I came to the same conclusion this morning (update longdna set dna
= dna || '';), but it still seems that the chunked table is very
slightly faster than the substring on the  externally stored column:

dna=# explain analyze select pdna from dna where foffset > 6000000 and
foffset < 6024000;
                                                     QUERY PLAN
------------------------------------------------------------------------------------------------------------------
  Index Scan using foffset_idx on dna  (cost=0.00..4.22 rows=14
width=32) (actual time=0.06..0.16 rows=11 loops=1)
    Index Cond: ((foffset > 6000000) AND (foffset < 6024000))
  Total runtime: 0.27 msec
(3 rows)

dna=# explain analyze select pdna from dna where foffset > 6000000 and
foffset < 6024000;
                                                     QUERY PLAN
------------------------------------------------------------------------------------------------------------------
  Index Scan using foffset_idx on dna  (cost=0.00..4.22 rows=14
width=32) (actual time=0.07..0.16 rows=11 loops=1)
    Index Cond: ((foffset > 6000000) AND (foffset < 6024000))
  Total runtime: 0.25 msec
(3 rows)

dna=# explain analyze select substr(dna,6002000,20000) from longdna;
                                            QUERY PLAN
------------------------------------------------------------------------------------------------
  Seq Scan on longdna  (cost=0.00..1.01 rows=1 width=32) (actual
time=0.46..0.47 rows=1 loops=1)
  Total runtime: 0.58 msec
(2 rows)

dna=# explain analyze select substr(dna,6002000,20000) from longdna;
                                            QUERY PLAN
------------------------------------------------------------------------------------------------
  Seq Scan on longdna  (cost=0.00..1.01 rows=1 width=32) (actual
time=0.23..0.24 rows=1 loops=1)
  Total runtime: 0.29 msec
(2 rows)

I ran each command twice after starting psql to observe the effects of
caching.

However with the provided sample data, longdna has only one row, and dna
has 122,540 rows, all of which are chunks of the one longdna row. I
would tend to think that if you had 1000 or so longdna records indexed
on some id column, versus 122,540,000 dna chunks indexed on both an id
and segment column, the substring from longdna would win.

Joe


В списке pgsql-performance по дате отправления:

Предыдущее
От: "scott.marlowe"
Дата:
Сообщение: Re: ext3 block size
Следующее
От: Tom Lane
Дата:
Сообщение: Re: EXTERNAL storage and substring on long strings