Обсуждение: Re: Browsing the tables / why pgsql does not perform well (with temp fix)

Поиск
Список
Период
Сортировка

Re: Browsing the tables / why pgsql does not perform well (with temp fix)

От
Hannu Krosing
Дата:
I wrote :

> The Hermit Hacker  wrote:
> > Jan Vicherek wrote:
> > >   Please please help me solve this or make workarounds or anything. I
> > > would *really* like to see PosgreSQL to be playing against the Big
> > > (commercial) Boys !
> >
> >         I'm curious, but can the "Big (commercial) Boys" do this?  If so,
> > can you please provide an example of which and how?
>
They optimise sorting (use indexes) as well and so make it doable using
cursors

I even think that Oracle can use indexes for some cases of count(*) and
min() and max().

> Most of us here have
> > access to an one or the other (me, Oracle) to use as a sample system...if
> > we can prove that it does work on another system, then we have something
> > to work with, but right now all I've seen is "I wish I could do this", and
> > several examples on how to accomplish it using PostgreSQL, but that's
> > it...
> >
> The main problem is that PostgreSQL does not use index for sorting and
> thus really does a
> "copy" of the whole table and then sorts it before it can use a few rows
> from the beginning.
>
> Using indexes for sorting as well as selecting is on the TODO list, but
> seems to be not very high priority.
>
> >
> > >    0. having a value of a field on which there is an index, how can
> > I do :
> > >        a) current_pointer = some_function("value_I_have");
> > >        b) next_pointer    = some_other_function(current_pointer);
> > >        c) one_tupple      = yet_another_function(next_pointer);
> > >    If I can accomplish a,b,c, then I win and I don't have to do
> > questions
> > >    1..5 below.
> >
> >         Why not put a sequence field on the table so that you can do:
> >
> >         select * from table where rowid = n;     -or-
> >         select * from table where rowid = n - 1; -or-
> >         select * from table where rowid = n + 1; -or-
> >         select * from table where rowid >= n and rowid <= n+x;
> >
> >         And create the index on rowid?
>
> It works no better than any other indexed field unless you disallow
> deletes.
>
> if aggregates were able to use indexes you could do:
>
> select min(n) from table where rowid >n;
>
> and then
>
> select * from table where n = n_found by_last_previous_select;
>
> but as they don't you would get very poor performance from the first
> select;
>
> This could be simulated by fetching only the first row from a cursor
> sorted on the field.
>
> So the real solution would be to use indexes for sorting, maybe at first
> for single field sorts.
>
> Then one could just do:
>
> --8<---------
> begin;
> declare cursor part_cursor for
>   select * from part_table
>     where indexed_field > 'last_value'
>     order by indexed_field ;
>
> fetch 10 from  part_cursor;
>
> close part_cursor;
> end;
> --8<---------
>
> for moving backwards you would of course use '<' and 'desc' in the
> select clause.
>
> Unfortunately it does not work nearly fast enough for big tables as
> often almost the whole table is copied and then sorted before you get
> your few rows.
>
It actually works for forward queries if you have a b-tree index on
indexed_field and if you omit the 'order by'-clause, but for all the
wrond reasons ;), i.e. the backend performes  an index scan and so does
the right thing.

I just tried it on a 2 780 000 record db (web access log, table size
500MB) and both the cursor was created and data was returned
immediatedly (fetch 3 in my_cursor).

I did:

begin;

declare my_cursor cursor for select * from access_log where adate >
'28/12/97';

fetch 3 in mu_cursor;

end;


It does not work for moving backwards as the backend does the index scan
in forward direction regardless of the comparison being '>' or '<';

I then tried adding the 'order by adate desc' to cursor definition. The
result was a complete disaster (barely avoided by manually killing the
postgres processes), as 'fetch 3 in mycursor' run for about 10 min and
in the process exhausted 2.3GB of disc space for pg_sort files.

> Even more unfortunately client side development tools like Access or
> Delphi seem to rely on sorted queries using indexes for sorting and as a
> result perform very poorly with PostgreSQL in their default modes.
>
> OTOH, it usually shows poor database design if you can't specify your
> seach criteria precisely enough to limit the number of rows to some
> manageable level in interactive applications. It really is the task of
> the database server to look up things. The poor user should not have to
> wade through zillions of records in a looking for the one she wants
> (even tho the you cand do it quite effectively using ISAM).
>
> OTOOH, it would be very hard for general client side tools to do without
> keyed access, so addind using indexes for sorting should be given at
> least some priority.
>
Maybe an quicker and easier but not 'future proof' (as it uses
undocumented and possibly soon-to-change features) fix would be to
reverse the index scan direction for '<' operator?

Vadim: how hard would implementing this proposal be?

(as I think that making the optimiser also optimize ordering would be
quite a big undertaking)

-----------------------
Hannu Krosing
Trust-O-Matic OÜ


Re: Browsing the tables / why pgsql does not perform well (with temp fix)

От
Jan Vicherek
Дата:
On Sat, 24 Jan 1998, Hannu Krosing wrote:

> > if aggregates were able to use indexes you could do:
> >
> > select min(n) from table where rowid >n;
> >
> > and then
> >
> > select * from table where n = n_found by_last_previous_select;
> >
> > but as they don't you would get very poor performance from the first
> > select;

  This looks good. No transactions necessary, no locking, no mutliple rows
copying.

  how hard would it be to make aggregates able to use indexes ?

  Could I manage in a day ? (10 hours)


> > This could be simulated by fetching only the first row from a cursor
> > sorted on the field.
> >
> > So the real solution would be to use indexes for sorting, maybe at first
> > for single field sorts.

    How many hours would that take to write ?

> > Then one could just do:
> >
> > --8<---------
> > begin;
> > declare cursor part_cursor for
> >   select * from part_table
> >     where indexed_field > 'last_value'
> >     order by indexed_field ;
> >
> > fetch 10 from  part_cursor;
> >
> > close part_cursor;
> > end;
> > --8<---------
> >
> > for moving backwards you would of course use '<' and 'desc' in the
> > select clause.
> >
> > Unfortunately it does not work nearly fast enough for big tables as
> > often almost the whole table is copied and then sorted before you get
> > your few rows.

    After code that makes sorting use indices would fix this problem,
right ?

        Jan

 -- Gospel of Jesus is the saving power of God for all who believe --
Jan Vicherek ## To some, nothing is impossible. ##  www.ied.com/~honza
    >>>    Free Software Union President  ...  www.fslu.org    <<<
Interactive Electronic Design Inc.    -#-    PGP: finger honza@ied.com


Re: Browsing the tables / why pgsql does not perform well (with temp fix)

От
The Hermit Hacker
Дата:
On Sat, 24 Jan 1998, Jan Vicherek wrote:

> On Sat, 24 Jan 1998, Hannu Krosing wrote:
>
> > > if aggregates were able to use indexes you could do:
> > >
> > > select min(n) from table where rowid >n;
> > >
> > > and then
> > >
> > > select * from table where n = n_found by_last_previous_select;
> > >
> > > but as they don't you would get very poor performance from the first
> > > select;
>
>   This looks good. No transactions necessary, no locking, no mutliple rows
> copying.

    The SELECT above will create a READ lock on the table, preventing
UPDATES from happening for the duration of the SELECT.  There is *no* way
of getting around or away from this lock...

> > > So the real solution would be to use indexes for sorting, maybe at first
> > > for single field sorts.
>
>     How many hours would that take to write ?

    As Bruce said in another posting, he isn't even sure *where* to
start on this, so I'd say over a month, if not longer...over a month in
the sense that as of the end of this week, there will be no new
developments on the source tree, only stabilization and bug fixes...




Marc G. Fournier
Systems Administrator @ hub.org
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org


Re: Browsing the tables / why pgsql does not perform well (with temp fix)

От
Jan Vicherek
Дата:
On Sat, 24 Jan 1998, The Hermit Hacker wrote:

> On Sat, 24 Jan 1998, Jan Vicherek wrote:
>
> > On Sat, 24 Jan 1998, Hannu Krosing wrote:
> >
> > > > if aggregates were able to use indexes you could do:
> > > >
> > > > select min(n) from table where rowid >n;
> > > >
> > > > and then
> > > >
> > > > select * from table where n = n_found by_last_previous_select;
> > > >
> > > > but as they don't you would get very poor performance from the first
> > > > select;
> >
> >   This looks good. No transactions necessary, no locking, no mutliple rows
> > copying.
>
>     The SELECT above will create a READ lock on the table, preventing
> UPDATES from happening for the duration of the SELECT.  There is *no* way
> of getting around or away from this lock...

   Yes, you are correct.
 In addition, there will be no long-lasting "begin - declare cursor -
end" statement, so the table will not get locked against updates for
minutes / hours when a person wants to "browse" the table ...

  .. hmm , but there still may be copying of mutliple rows if the "rowid"
field is not unique.

   Again, accessing the "next" item in the index is the only solution. The
next item in index has a tid which points to next row in the browsed
table.

       Jan

 -- Gospel of Jesus is the saving power of God for all who believe --
Jan Vicherek ## To some, nothing is impossible. ##  www.ied.com/~honza
    >>>    Free Software Union President  ...  www.fslu.org    <<<
Interactive Electronic Design Inc.    -#-    PGP: finger honza@ied.com


Re: Browsing the tables / why pgsql does not perform well (with temp fix)

От
The Hermit Hacker
Дата:
> >
> >     The SELECT above will create a READ lock on the table, preventing
> > UPDATES from happening for the duration of the SELECT.  There is *no* way
> > of getting around or away from this lock...
>
>    Yes, you are correct.
>  In addition, there will be no long-lasting "begin - declare cursor -
> end" statement, so the table will not get locked against updates for
> minutes / hours when a person wants to "browse" the table ...

    Actually, here i believe you are wrong.  Bruce, please correct me
if I'm wrong, but it would be faster for you to do the
begin;declare...;move...;fetch...;end;  then doing a straight SELECT.

    I'm not *certain* about this, but the way I believe that it works
is that if you do:

begin;
declare cursor mycursor for select * from table order by field;
move forward 20;
fetch 20;
end;

    The SELECT/ORDER BY is done in the backend, as is the MOVE/FETCH
before returning any data to the front end.  So, now you are returning
let's say 100 records to the front end, instead of the whole table.  If
you do a SELECT, it will return *all* the records to the front end.

    So, I would imagine that it would be slightly longer to SELECT all
records and send them all to the front end then it would be to SELECT all
records and just return the 100 that you want.

    Bruce, is this a correct assessment?


Marc G. Fournier
Systems Administrator @ hub.org
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org


Re: Browsing the tables / why pgsql does not perform well (with temp fix)

От
Jan Vicherek
Дата:
On Sun, 25 Jan 1998, The Hermit Hacker wrote:

> > >
> > >     The SELECT above will create a READ lock on the table, preventing
> > > UPDATES from happening for the duration of the SELECT.  There is *no* way
> > > of getting around or away from this lock...
> >
> >    Yes, you are correct.
> >  In addition, there will be no long-lasting "begin - declare cursor -
> > end" statement, so the table will not get locked against updates for
> > minutes / hours when a person wants to "browse" the table ...
>
>     Actually, here i believe you are wrong.

  If I was to select whole table at once, I would be wrong.
 However, with this "browsing" business I would only do (if aggregates
used indeces) :

 CurrentValue = NextValue;
 SELECT min(IndexedField) as NextValue from MyTable where IndexedField > CurrentValue;
 SELECT * from MyTable where IndexedField = NextValue;
 <return * to user he he can look at it 5 hours>

   Since the above locks the table only during the SELECTs (which are very
short [ < .5 seconds ]), other people can update the table while one users
stares at the results for 5 hours. The first SELECT is very short because
we assume that aggregates can use indeces. The second SELECT is very short
because we have "WHERE" which uses only a field on which we have an index.

 After the 5 hours the user decides to see the next record, so we execute
those 2 SELECT statements again.

 The only problem with the above is that if IndexedField isn't unique, the
SELECT will return multiple records instead of only one. (The user can
see only one record on the screan at a time.) So this would require the
code between the database and the application to handle multiple-row
results of the second SELECT statement. An unnecessary complication :

  This situation would be solved if I could retrieve a (valid) record
based not on the (non-unique) IndexedField, but based on the "tid" of the
record. This cannot currently be done through SQL (as the FAQ says).

<joke_that_might_become_a_reality>
 I'm tempted to implement an extension to the current postgres' SQL
language that would allow me to do :

  CREATE INDEX MyIndex ON MyTable (IndexField);
  SELECT FIRST(Tid) AS MyFirstTid FROM MyIndex WHERE IndexField = 'what_user_requests';
  SELECT * from MyTable where Tid = MyFirstTid;
  SELECT NEXT(Tid) AS MyNextTid FROM MyIndex WHERE Tid = MyFirstTid;

</joke_that_might_become_a_reality>

  I'll ask on developers list whether this is technically possible, given
the access that the SQL processing routines have to the back-end index
processing routines.

> begin;
> declare cursor mycursor for select * from table order by field;
> move forward 20;
> fetch 20;
> end;

  the application would have to handle in this case multiple-row results
coming from the backend.


         Thanx for all your intpt,

            keep it coming,

                    Jan

 -- Gospel of Jesus is the saving power of God for all who believe --
Jan Vicherek ## To some, nothing is impossible. ##  www.ied.com/~honza
    >>>    Free Software Union President  ...  www.fslu.org    <<<
Interactive Electronic Design Inc.    -#-    PGP: finger honza@ied.com