Обсуждение: Re: [SQL] OFFSET impact on Performance???

Поиск
Список
Период
Сортировка

Re: [SQL] OFFSET impact on Performance???

От
"Merlin Moncure"
Дата:
> Now I read all the posts and I have some answers.
>
> Yes, I have a web aplication.
> I HAVE to know exactly how many pages I have and I have to allow the
user
> to
> jump to a specific page(this is where I used limit and offset). We
have
> this
> feature and I cannot take it out.

If your working set is small, say a couple hundred records at the most
(web form or such), limit/offset may be ok.  However you are already
paying double because you are extracting the # of records matching your
where clause, yes?  Also, this # can change while the user is browsing,
heh.

IOW, your application code is writing expensive checks that the database
has to cash.

> >> > SELECT * FROM tab WHERE col > ? ORDER BY col LIMIT 50
> Now this solution looks very fast, but I cannot implement it, because
I
> cannot jump from page 1 to page xxxx only to page 2. Because I know
with
> this type where did the page 1 ended. And we have some really
complicated
> where's and about 10 tables are involved in the sql query.
> About the CURSOR I have to read more about them because this is my
first
> time when I hear about.
> I don't know if temporary tables are a solution, really I don't think
so,
> there are a lot of users that are working in the same time at the same
> page.

Cursors held by a connection.  If your web app keeps persistent
connection, you can use them.  In this case, pass the where clause to a
plpgsql function which returns a composite object containing a refcursor
object and the number of rows (read the docs!).  If/When pg gets shared
cursors, this may be the way to go...but in this case you may have to
worry about closing them.

Without a connection, you need some type of persistence on the database.
This is complex but it can be done...but it will not be faster than
limit offset for browsing relatively small sets.

Merlin

Re: [SQL] OFFSET impact on Performance???

От
Greg Stark
Дата:
> > Now I read all the posts and I have some answers.
> >
> > Yes, I have a web aplication. I HAVE to know exactly how many pages I have
> > and I have to allow the user to jump to a specific page(this is where I
> > used limit and offset). We have this feature and I cannot take it out.

I'm afraid you have a problem then. The only way postgres can know exactly how
many pages and allow users to jump to a specific point for an arbitrary query
is by doing what OFFSET and LIMIT does.

There are ways to optimize this but they'll be lots of work. And they'll only
amount to moving around when the work is done. The work of gathering all the
records from the query will still have to be done sometime.

If the queries are relatively static you could preprocess the data so you have
all the results in a table with a sequential id. Then you can get the maximum
and jump around in the table using an index all you want.

Otherwise you could consider performing the queries on demand and storing them
in a temporary table. Then fetch the actual records for the page from the
temporary table again using an index on a sequential id to jump around. This
might make the actual performing of the initial query much slower though since
you have to wait for the entire query to be performed and the records stored.
You'll also have to deal with vacuuming this table aggressively.


--
greg

Re: [SQL] OFFSET impact on Performance???

От
PFC
Дата:
Supposing your searches display results which are rows coming from one
specific table, you could create a cache table :

search_id    serial primary key
index_n    position of this result in the global result set
result_id    id of the resulting row.

Then, making a search with 50k results would INSERT INTO cache ... SELECT
 FROM search query, with a way to set the index_n column, which can be a
temporary sequence...

Then to display your pages, SELECT from your table with index_n BETWEEN so
and so, and join to the data table.

If you're worried that it might take up too much space : store an integer
array of result_id instead of just a result_id ; this way you insert fewer
rows and save on disk space. Generate it with a custom aggregate... then
just grab a row from this table, it contains all the id's of the rows to
display.



Re: [SQL] OFFSET impact on Performance???

От
Alex Turner
Дата:
The problem with this approach is TTFB (Time to first Byte).  The
initial query is very slow, but additional requests are fast.  In most
situations we do not want the user to have to wait a disproportionate
amount of time for the initial query.  If this is the first time using
the system this will be the impression that will stick with them.  I
guess we could experiment and see how much extra time creating a cache
table will take...

Alex Turner
NetEconomist


On Wed, 26 Jan 2005 13:58:18 +0100, PFC <lists@boutiquenumerique.com> wrote:
>
> Supposing your searches display results which are rows coming from one
> specific table, you could create a cache table :
>
> search_id       serial primary key
> index_n position of this result in the global result set
> result_id       id of the resulting row.
>
> Then, making a search with 50k results would INSERT INTO cache ... SELECT
>  FROM search query, with a way to set the index_n column, which can be a
> temporary sequence...
>
> Then to display your pages, SELECT from your table with index_n BETWEEN so
> and so, and join to the data table.
>
> If you're worried that it might take up too much space : store an integer
> array of result_id instead of just a result_id ; this way you insert fewer
> rows and save on disk space. Generate it with a custom aggregate... then
> just grab a row from this table, it contains all the id's of the rows to
> display.
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faq
>

Re: [SQL] OFFSET impact on Performance???

От
Greg Stark
Дата:
Alex Turner <armtuk@gmail.com> writes:

> The problem with this approach is TTFB (Time to first Byte).  The
> initial query is very slow, but additional requests are fast.  In most
> situations we do not want the user to have to wait a disproportionate
> amount of time for the initial query.  If this is the first time using
> the system this will be the impression that will stick with them.  I
> guess we could experiment and see how much extra time creating a cache
> table will take...

You could cheat and do queries with an offset of 0 directly but also start up
a background job to fetch the complete results and cache them. queries with a
non-zero offset would have to wait until the complete cache is built. You have
to be careful about people arriving from bookmarks to non-zero offsets and
people hitting reload before the cache is finished being built.

As someone else suggested you could look into other systems for storing the
cache. If you don't need to join against other database tables and you don't
need the reliability of a database then there are faster solutions like
memcached for example. (The problem of joining against database tables is even
solvable, look up pgmemcached. No idea how it performs though.)

But I think you're running into a fundamental tension here. The feature you're
looking for: being able to jump around in an arbitrary non-indexed query
result set which can be arbitrarily large, requires a lot of work. All you can
do is shift around *when* that work is done. There's not going to be any way
to avoid doing the work entirely.

--
greg

Re: [SQL] OFFSET impact on Performance???

От
PFC
Дата:
> The problem with this approach is TTFB (Time to first Byte).  The
> initial query is very slow, but additional requests are fast.  In most
> situations we do not want the user to have to wait a disproportionate
> amount of time for the initial query.  If this is the first time using
> the system this will be the impression that will stick with them.  I
> guess we could experiment and see how much extra time creating a cache
> table will take...

    Do it on the second page then ;)

    Seriously :
    - If you want to display the result count and page count, you'll need to
do the whole query anyway, so you might as well save the results.
    - inserting the result id's in a temp table one by one will be slow, but
you can do this :

select array_accum(id) from temp group by id/20 limit 3;
                           array_accum
---------------------------------------------------------------
  {1,2,4,8,16,17,9,18,19,5,10,11,3,6,12,13,7,14,15}
  {32,33,34,35,36,37,38,39,20,21,22,23,24,25,26,27,28,29,30,31}
  {40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59}

    - a really big search of 131072 results :
create table cache (id serial primary key, value integer[]);
explain analyze insert into cache (value) select array_accum(id) from temp
group by id/100;
  Subquery Scan "*SELECT*"  (cost=14382.02..17986.50 rows=131072 width=32)
(actual time=961.746..1446.630 rows=1311 loops=1)
    ->  GroupAggregate  (cost=14382.02..16020.42 rows=131072 width=4)
(actual time=961.607..1423.803 rows=1311 loops=1)
          ->  Sort  (cost=14382.02..14709.70 rows=131072 width=4) (actual
time=961.181..1077.662 rows=131072 loops=1)
                Sort Key: (id / 100)
                ->  Seq Scan on "temp"  (cost=0.00..2216.40 rows=131072
width=4) (actual time=0.032..291.652 rows=131072 loops=1)
  Total runtime: 1493.304 ms

    Note that the "SELECT..." part takes 1400 ms, and the INSERT part takes
the rest, which is really small. It's the sort which takes most of the
time, but you'll be doing it anyway to get your results in order, so it
comes free to you. This will generate 1000 pages with 100 results on each.
If your searches yield say 1000 results it'll be perfectly fine and can
target times in the sub-100 ms for caching the results (not counting the
total query time of course !)

    Using arrays is the key here, because inserting all the results as
individual rows in the table is gonna be a whole lot slower !





Re: [SQL] OFFSET impact on Performance???

От
Alex Turner
Дата:
Thats a really good idea, just store a list of the sorted ids in the
temp table - small amount of data for insert... I like it!

Alex Turner
NetEconomist


On Wed, 26 Jan 2005 22:24:34 +0100, PFC <lists@boutiquenumerique.com> wrote:
> > The problem with this approach is TTFB (Time to first Byte).  The
> > initial query is very slow, but additional requests are fast.  In most
> > situations we do not want the user to have to wait a disproportionate
> > amount of time for the initial query.  If this is the first time using
> > the system this will be the impression that will stick with them.  I
> > guess we could experiment and see how much extra time creating a cache
> > table will take...
>
>         Do it on the second page then ;)
>
>         Seriously :
>         - If you want to display the result count and page count, you'll need to
> do the whole query anyway, so you might as well save the results.
>         - inserting the result id's in a temp table one by one will be slow, but
> you can do this :
>
> select array_accum(id) from temp group by id/20 limit 3;
>                            array_accum
> ---------------------------------------------------------------
>   {1,2,4,8,16,17,9,18,19,5,10,11,3,6,12,13,7,14,15}
>   {32,33,34,35,36,37,38,39,20,21,22,23,24,25,26,27,28,29,30,31}
>   {40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59}
>
>         - a really big search of 131072 results :
> create table cache (id serial primary key, value integer[]);
> explain analyze insert into cache (value) select array_accum(id) from temp
> group by id/100;
>   Subquery Scan "*SELECT*"  (cost=14382.02..17986.50 rows=131072 width=32)
> (actual time=961.746..1446.630 rows=1311 loops=1)
>     ->  GroupAggregate  (cost=14382.02..16020.42 rows=131072 width=4)
> (actual time=961.607..1423.803 rows=1311 loops=1)
>           ->  Sort  (cost=14382.02..14709.70 rows=131072 width=4) (actual
> time=961.181..1077.662 rows=131072 loops=1)
>                 Sort Key: (id / 100)
>                 ->  Seq Scan on "temp"  (cost=0.00..2216.40 rows=131072
> width=4) (actual time=0.032..291.652 rows=131072 loops=1)
>   Total runtime: 1493.304 ms
>
>         Note that the "SELECT..." part takes 1400 ms, and the INSERT part takes
> the rest, which is really small. It's the sort which takes most of the
> time, but you'll be doing it anyway to get your results in order, so it
> comes free to you. This will generate 1000 pages with 100 results on each.
> If your searches yield say 1000 results it'll be perfectly fine and can
> target times in the sub-100 ms for caching the results (not counting the
> total query time of course !)
>
>         Using arrays is the key here, because inserting all the results as
> individual rows in the table is gonna be a whole lot slower !
>
>

Re: [SQL] OFFSET impact on Performance???

От
PFC
Дата:
> Thats a really good idea, just store a list of the sorted ids in the
> temp table - small amount of data for insert... I like it!
>
> Alex Turner
> NetEconomist

    The best part is that you can skip the LIMIT/OFFSET entirely if you put
page numbers in your cache table while inserting into it, via a temporary
sequence or something. Retrieving the results will then be very fast, but
beware that SELECT * FROM table WHERE id =ANY( array ) won't use an index,
so you'll have to trick the thing by generating a query with IN, or
joining against a SRF returning the elements of the array one by one,
which might be better.

Re: [SQL] OFFSET impact on Performance???

От
Oleg Bartunov
Дата:
On Thu, 27 Jan 2005, PFC wrote:

>
>> Thats a really good idea, just store a list of the sorted ids in the
>> temp table - small amount of data for insert... I like it!
>>
>> Alex Turner
>> NetEconomist
>
>     The best part is that you can skip the LIMIT/OFFSET entirely if you
> put page numbers in your cache table while inserting into it, via a temporary
> sequence or something. Retrieving the results will then be very fast, but
> beware that SELECT * FROM table WHERE id =ANY( array ) won't use an index, so

contrib/intarray provides index access to such queries.

> you'll have to trick the thing by generating a query with IN, or joining
> against a SRF returning the elements of the array one by one, which might be
> better.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
>              http://archives.postgresql.org
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

Re: [SQL] OFFSET impact on Performance???

От
PFC
Дата:
>>     The best part is that you can skip the LIMIT/OFFSET entirely if you
>> put page numbers in your cache table while inserting into it, via a
>> temporary sequence or something. Retrieving the results will then be
>> very fast, but beware that SELECT * FROM table WHERE id =ANY( array )
>> won't use an index, so
>
> contrib/intarray provides index access to such queries.

    Can you provide an example of such a query ? I've looked at the operators
for intarray without finding it.
    Thanks.

Re: [SQL] OFFSET impact on Performance???

От
Oleg Bartunov
Дата:
On Thu, 27 Jan 2005, PFC wrote:

>
>>>     The best part is that you can skip the LIMIT/OFFSET entirely if you
>>> put page numbers in your cache table while inserting into it, via a
>>> temporary sequence or something. Retrieving the results will then be very
>>> fast, but beware that SELECT * FROM table WHERE id =ANY( array ) won't use
>>> an index, so
>>
>> contrib/intarray provides index access to such queries.
>
>     Can you provide an example of such a query ? I've looked at the
> operators for intarray without finding it.

for example,
http://www.sai.msu.su/~megera/postgres/gist/code/7.3/README.intarray
see OPERATIONS and EXAMPLE USAGE:

SELECT * FROM table WHERE id && int[]


>     Thanks.
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

Re: [SQL] OFFSET impact on Performance???

От
Greg Stark
Дата:
Oleg Bartunov <oleg@sai.msu.su> writes:

> On Thu, 27 Jan 2005, PFC wrote:
>
> >
> > > > beware that SELECT * FROM table WHERE id =ANY( array ) won't use an index,

> > > contrib/intarray provides index access to such queries.
> >
> > Can you provide an example of such a query ? I've looked at the operators
> > for intarray without finding it.
>
> for example,
> http://www.sai.msu.su/~megera/postgres/gist/code/7.3/README.intarray
> see OPERATIONS and EXAMPLE USAGE:
>
> SELECT * FROM table WHERE id && int[]

I don't think that helps him. He wants the join to the *other* table to use an
index. It would be nice if the IN plan used an index for =ANY(array) just like
it does for =ANY(subquery) but I'm not sure the statistics are there. It might
not be a bad plan to just assume arrays are never going to be millions of
elements long though.

There is a way to achieve this using "int_array_enum" from another contrib
module, "intagg". My current project uses something similar to this except the
arrays are precomputed. When I went to 7.4 the new array support obsoleted
everything else I was using from the "intagg" and "array" contrib moduels
except for this one instance where intagg is still necessary.

It is a bit awkward but it works:

slo=> EXPLAIN
       SELECT *
         FROM foo
         JOIN (SELECT int_array_enum(foo_ids) AS foo_id
                 FROM cache
                WHERE cache_id = 1) AS x
        USING (foo_id) ;
                                            QUERY PLAN
--------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..6.40 rows=1 width=726)
   ->  Subquery Scan x  (cost=0.00..3.18 rows=1 width=4)
         ->  Index Scan using idx_cache on cache  (cost=0.00..3.17 rows=1 width=30)
               Index Cond: (cache_id = 1)
   ->  Index Scan using foo_pkey on foo  (cost=0.00..3.21 rows=1 width=726)
         Index Cond: (foo.foo_id = "outer".foo_id)
(6 rows)


(query and plan edited for clarity and for paranoia purposes)


--
greg

Re: [SQL] OFFSET impact on Performance???

От
PFC
Дата:

> for example,
> http://www.sai.msu.su/~megera/postgres/gist/code/7.3/README.intarray
> see OPERATIONS and EXAMPLE USAGE:
>

    Thanks, I already know this documentation and have used intarray before
(I find it absolutely fabulous in the right application, it has a great
potential for getting out of tight situations which would involve huge
unmanageable pivot or attributes tables). Its only drawback is that the
gist index creation time is slow and sometimes just... takes forever until
the disk is full.
    However, it seems that integer && integer[] does not exist :

> SELECT * FROM table WHERE id && int[]

explain analyze select * from temp t where id &&
( '{1,2,3,4,5,6,7,8,9,10,11,12}'::integer[] );
ERREUR:  L'operateur n'existe pas : integer && integer[]
ASTUCE : Aucun operateur correspond au nom donne et aux types d'arguments.
Vous devez ajouter des conversions explicites de type.

    I have already used this type of intarray indexes, but you have to create
a special gist index with the gist__int_ops on the column, and the column
has to be an array. In my case the column is just a SERIAL PRIMARY KEY,
and should stay this way, and I don't want to create a functional index in
array[id] just for this feature ; so I guess I can't use the && operator.
Am I mistaken ? My index is the standard btree here.

    It would be nice if the =ANY() could use the index just like IN does ;
besides at planning time the length of the array is known which makes it
behave quite just like IN().

    So I'll use either an EXECUTE'd plpgsql-generated query (IN (....)) ,
which I don't like because it's a kludge ; or this other solution which I
find more elegant :

CREATE OR REPLACE FUNCTION tools.array_srf( INTEGER[] )
         RETURNS SETOF INTEGER        RETURNS NULL ON NULL INPUT
LANGUAGE plpgsql        AS
$$
DECLARE
    _data    ALIAS FOR $1;
    _i        INTEGER;
BEGIN
    FOR _i IN 1..icount(_data) LOOP
        RETURN NEXT _data[_i];
    END LOOP;
    RETURN;
END;
$$;

-----------------------------------------------------------------------------------
explain analyze select * from temp t where id
=ANY( '{1,2,3,4,5,6,7,8,9,10,11,12}' );
  Seq Scan on "temp" t  (cost=0.00..5165.52 rows=65536 width=8) (actual
time=0.030..173.319 rows=12 loops=1)
    Filter: (id = ANY ('{1,2,3,4,5,6,7,8,9,10,11,12}'::integer[]))
  Total runtime: 173.391 ms

-----------------------------------------------------------------------------------
explain analyze select * from temp t where id
IN( 1,2,3,4,5,6,7,8,9,10,11,12 );
  Index Scan using temp_pkey, temp_pkey, temp_pkey, temp_pkey, temp_pkey,
temp_pkey, temp_pkey, temp_pkey, temp_pkey, temp_pkey, temp_pkey,
temp_pkey on "temp" t  (cost=0.00..36.49 rows=12 width=8) (actual
time=0.046..0.137 rows=12 loops=1)
    Index Cond: ((id = 1) OR (id = 2) OR (id = 3) OR (id = 4) OR (id = 5)
OR (id = 6) OR (id = 7) OR (id = 8) OR (id = 9) OR (id = 10) OR (id = 11)
OR (id = 12))
  Total runtime: 0.292 ms

-----------------------------------------------------------------------------------
explain analyze select * from temp t where id in (select * from
tools.array_srf('{1,2,3,4,5,6,7,8,9,10,11,12}'));
  Nested Loop  (cost=15.00..620.20 rows=200 width=8) (actual
time=0.211..0.368 rows=12 loops=1)
    ->  HashAggregate  (cost=15.00..15.00 rows=200 width=4) (actual
time=0.160..0.173 rows=12 loops=1)
          ->  Function Scan on array_srf  (cost=0.00..12.50 rows=1000
width=4) (actual time=0.127..0.139 rows=12 loops=1)
    ->  Index Scan using temp_pkey on "temp" t  (cost=0.00..3.01 rows=1
width=8) (actual time=0.010..0.012 rows=1 loops=12)
          Index Cond: (t.id = "outer".array_srf)
  Total runtime: 0.494 ms

-----------------------------------------------------------------------------------
explain analyze select * from temp t, (select * from
tools.array_srf('{1,2,3,4,5,6,7,8,9,10,11,12}')) foo where foo.array_srf =
t.id;

  Merge Join  (cost=62.33..2824.80 rows=1000 width=12) (actual
time=0.215..0.286 rows=12 loops=1)
    Merge Cond: ("outer".id = "inner".array_srf)
    ->  Index Scan using temp_pkey on "temp" t  (cost=0.00..2419.79
rows=131072 width=8) (actual time=0.032..0.056 rows=13 loops=1)
    ->  Sort  (cost=62.33..64.83 rows=1000 width=4) (actual
time=0.169..0.173 rows=12 loops=1)
          Sort Key: array_srf.array_srf
          ->  Function Scan on array_srf  (cost=0.00..12.50 rows=1000
width=4) (actual time=0.127..0.139 rows=12 loops=1)
  Total runtime: 0.391 ms

-----------------------------------------------------------------------------------
Note that the meaning is different ; the IN removes duplicates in the
array but the join does not.



Re: [SQL] OFFSET impact on Performance???

От
Oleg Bartunov
Дата:
On Thu, 27 Jan 2005, PFC wrote:

>
>
>> for example,
>> http://www.sai.msu.su/~megera/postgres/gist/code/7.3/README.intarray
>> see OPERATIONS and EXAMPLE USAGE:
>>
>
>     Thanks, I already know this documentation and have used intarray
> before (I find it absolutely fabulous in the right application, it has a
> great potential for getting out of tight situations which would involve huge
> unmanageable pivot or attributes tables). Its only drawback is that the gist
> index creation time is slow and sometimes just... takes forever until the
> disk is full.
>     However, it seems that integer && integer[] does not exist :

Try intset(id) && int[]. intset is an undocumented function :)
I'm going to add intset() to README.

>
>> SELECT * FROM table WHERE id && int[]
>
> explain analyze select * from temp t where id && (
> '{1,2,3,4,5,6,7,8,9,10,11,12}'::integer[] );
> ERREUR:  L'operateur n'existe pas : integer && integer[]
> ASTUCE : Aucun operateur correspond au nom donne et aux types d'arguments.
> Vous devez ajouter des conversions explicites de type.
>
>     I have already used this type of intarray indexes, but you have to
> create a special gist index with the gist__int_ops on the column, and the
> column has to be an array. In my case the column is just a SERIAL PRIMARY
> KEY, and should stay this way, and I don't want to create a functional index
> in array[id] just for this feature ; so I guess I can't use the && operator.
> Am I mistaken ? My index is the standard btree here.
>         It would be nice if the =ANY() could use the index just like
> IN does ; besides at planning time the length of the array is known which
> makes it behave quite just like IN().
>
>     So I'll use either an EXECUTE'd plpgsql-generated query (IN (....)) ,
> which I don't like because it's a kludge ; or this other solution which I
> find more elegant :
>
> CREATE OR REPLACE FUNCTION tools.array_srf( INTEGER[] )
>        RETURNS SETOF INTEGER        RETURNS NULL ON NULL INPUT
> LANGUAGE plpgsql        AS
> $$
> DECLARE
>     _data    ALIAS FOR $1;
>     _i        INTEGER;
> BEGIN
>     FOR _i IN 1..icount(_data) LOOP
>         RETURN NEXT _data[_i];
>     END LOOP;
>     RETURN;
> END;
> $$;
>
> -----------------------------------------------------------------------------------
> explain analyze select * from temp t where id =ANY(
> '{1,2,3,4,5,6,7,8,9,10,11,12}' );
> Seq Scan on "temp" t  (cost=0.00..5165.52 rows=65536 width=8) (actual
> time=0.030..173.319 rows=12 loops=1)
>   Filter: (id = ANY ('{1,2,3,4,5,6,7,8,9,10,11,12}'::integer[]))
> Total runtime: 173.391 ms
>
> -----------------------------------------------------------------------------------
> explain analyze select * from temp t where id IN( 1,2,3,4,5,6,7,8,9,10,11,12
> );
> Index Scan using temp_pkey, temp_pkey, temp_pkey, temp_pkey, temp_pkey,
> temp_pkey, temp_pkey, temp_pkey, temp_pkey, temp_pkey, temp_pkey,
> temp_pkey on "temp" t  (cost=0.00..36.49 rows=12 width=8) (actual
> time=0.046..0.137 rows=12 loops=1)
>   Index Cond: ((id = 1) OR (id = 2) OR (id = 3) OR (id = 4) OR (id = 5) OR
> (id = 6) OR (id = 7) OR (id = 8) OR (id = 9) OR (id = 10) OR (id = 11) OR (id
> = 12))
> Total runtime: 0.292 ms
>
> -----------------------------------------------------------------------------------
> explain analyze select * from temp t where id in (select * from
> tools.array_srf('{1,2,3,4,5,6,7,8,9,10,11,12}'));
> Nested Loop  (cost=15.00..620.20 rows=200 width=8) (actual time=0.211..0.368
> rows=12 loops=1)
>   ->  HashAggregate  (cost=15.00..15.00 rows=200 width=4) (actual
> time=0.160..0.173 rows=12 loops=1)
>         ->  Function Scan on array_srf  (cost=0.00..12.50 rows=1000 width=4)
> (actual time=0.127..0.139 rows=12 loops=1)
>   ->  Index Scan using temp_pkey on "temp" t  (cost=0.00..3.01 rows=1
> width=8) (actual time=0.010..0.012 rows=1 loops=12)
>         Index Cond: (t.id = "outer".array_srf)
> Total runtime: 0.494 ms
>
> -----------------------------------------------------------------------------------
> explain analyze select * from temp t, (select * from
> tools.array_srf('{1,2,3,4,5,6,7,8,9,10,11,12}')) foo where foo.array_srf =
> t.id;
>
> Merge Join  (cost=62.33..2824.80 rows=1000 width=12) (actual
> time=0.215..0.286 rows=12 loops=1)
>   Merge Cond: ("outer".id = "inner".array_srf)
>   ->  Index Scan using temp_pkey on "temp" t  (cost=0.00..2419.79
> rows=131072 width=8) (actual time=0.032..0.056 rows=13 loops=1)
>   ->  Sort  (cost=62.33..64.83 rows=1000 width=4) (actual time=0.169..0.173
> rows=12 loops=1)
>         Sort Key: array_srf.array_srf
>         ->  Function Scan on array_srf  (cost=0.00..12.50 rows=1000 width=4)
> (actual time=0.127..0.139 rows=12 loops=1)
> Total runtime: 0.391 ms
>
> -----------------------------------------------------------------------------------
> Note that the meaning is different ; the IN removes duplicates in the array
> but the join does not.
>
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

Re: [SQL] OFFSET impact on Performance???

От
PFC
Дата:
>>     However, it seems that integer && integer[] does not exist :
>
> Try intset(id) && int[]. intset is an undocumented function :)
> I'm going to add intset() to README.
>
>>
>>> SELECT * FROM table WHERE id && int[]

    Mm.
    intset(x) seems to be like array[x] ?
    Actually what I want is the opposite. I have a btree index on an integer
column ; I wanted to use this index and not create a functional index...
which is why I wanted to use =ANY(). If I had a gist index on an integer
array column, I would of course use what you suggest, but this is not the
case...

    Anyway I think the SRF function solution works well, I like it.

    Note that int_agg_final_array() crashes my postgres, see my message in
psql/general

    Regards,
    Pierre

Re: [SQL] OFFSET impact on Performance???

От
Greg Stark
Дата:
PFC <lists@boutiquenumerique.com> writes:

>     intset(x) seems to be like array[x] ?
>     Actually what I want is the opposite.

What you want is called UNNEST. It didn't get done in time for 8.0. But if
what you have is an array of integers the int_array_enum() function I quoted
in the other post is basically that.

> Note that int_agg_final_array() crashes my postgres, see my message in
> psql/general

You don't really need the int_array_aggregate function any more. You can write
your own aggregate using the new array operators:

test=> create or replace function array_push (anyarray, anyelement) returns anyarray as 'select $1 || $2' language sql
immutablestrict; 
CREATE FUNCTION
test=> create aggregate array_aggregate (basetype=anyelement, sfunc=array_push, stype=anyarray, initcond = '{}');
CREATE AGGREGATE

Of course it's about 50x slower than the C implementation though:

test=> select icount(array_aggregate (foo_id)) from foo;
 icount
--------
  15127
(1 row)

Time: 688.419 ms

test=> select icount(int_array_aggregate (foo_id)) from foo;
 icount
--------
  15127
(1 row)

Time: 13.680 ms

(And no, that's not a caching artifact; the whole table is cached for both
trials)

--
greg

Re: [SQL] OFFSET impact on Performance???

От
PFC
Дата:
> What you want is called UNNEST. It didn't get done in time for 8.0. But
> if
> what you have is an array of integers the int_array_enum() function I
> quoted
> in the other post is basically that.

    Yes, I used it, thanks. That's what I wanted. The query plans are good.

> You don't really need the int_array_aggregate function any more. You can
> write
> your own aggregate using the new array operators:
> Of course it's about 50x slower than the C implementation though:

    Heh. I'll keep using int_array_aggregate ;)

    Have a nice day.

Re: [SQL] OFFSET impact on Performance???

От
Kevin Brown
Дата:
PFC wrote:
>
> Supposing your searches display results which are rows coming from one
> specific table, you could create a cache table :
>
> search_id    serial primary key
> index_n    position of this result in the global result set
> result_id    id of the resulting row.
>
> Then, making a search with 50k results would INSERT INTO cache ... SELECT
> FROM search query, with a way to set the index_n column, which can be a
> temporary sequence...
>
> Then to display your pages, SELECT from your table with index_n BETWEEN so
> and so, and join to the data table.

This is a nice way of doing a fast materialized view.  But it looked
to me like one of the requirements of the original poster is that the
result set being displayed has to be "current" as of the page display
time.  If inserts to the original table have been committed between
the time the current page was displayed and "now", the next page view
is supposed to show them.  That basically means rerunning the query
that was used to build the cache table.

But perhaps the original poster is willing to live with the idea that
new rows won't show up in the result set, as long as updates show up
(because the cache table is just a fancy index) and deletes "work"
(because the join against the data table will show only rows that are
common between both).


--
Kevin Brown                          kevin@sysexperts.com