Обсуждение: tsearch comments

Поиск
Список
Период
Сортировка

tsearch comments

От
Björn Metzdorf
Дата:
Hi,

I was browsing the archives for anything related to tsearch and stumbled
across a post from Christopher Kings-Lynne in Nov 2002. He suggested to have
txt2txtidx as an index function, but he had some problems implementing his
idea.

But his idea works very fine. Just apply the attached patch (against 7.3.1),
install tsearch as normal, and then

CREATE INDEX my_test_idx ON mytable using gist(txt2txtidx(mycolumn));

assuming "mycolumn" is a text-type column.

Then you can search "mycolumn" with

SELECT * FROM mytable WHERE txt2txtidx(mycolumn) ## 'patch&gist';

This method has several advantages:

- you don't have to extend your table with another column
- pg_dump's output won't be bloated
- you don't have to use a special trigger, the index keeps itself up to
date!

Besides, we tested tsearch also with arabian and chinese input, and it works
perfectly with unicode databases as well (although it must be the exact same
phrase, since the builtin stemmer cannot stem chinese :).

Oleg: Can you search your memory why txt2txtidx possibly should not be
marked as "iscachable"?
The rest: Is there perhaps a problem with this approach that I am not aware
of?

Regards,
Bjoern

Вложения

Re: tsearch comments

От
Björn Metzdorf
Дата:
> Oleg: Can you search your memory why txt2txtidx possibly should not be
> marked as "iscachable"?

Well I now see where the problem is, the function does not always have to
return the same if the arguments are the same (the data might have changed
and there may be more or less matching entries). Any hints how to overcome
this?

Regards,
Bjoern


Re: tsearch comments

От
Oleg Bartunov
Дата:
On Wed, 22 Jan 2003, [iso-8859-1] BjЖrn Metzdorf wrote:

> > Oleg: Can you search your memory why txt2txtidx possibly should not be
> > marked as "iscachable"?
>
> Well I now see where the problem is, the function does not always have to
> return the same if the arguments are the same (the data might have changed
> and there may be more or less matching entries). Any hints how to overcome
> this?

No way, Bjoern. Think about stemming, for example.

>
> Regards,
> Bjoern
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org
>

    Regards,
        Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83


Re: tsearch comments

От
"Bjorn Metzdorf"
Дата:
> > Well I now see where the problem is, the function does not always have
to
> > return the same if the arguments are the same (the data might have
changed
> > and there may be more or less matching entries). Any hints how to
overcome
> > this?
>
> No way, Bjoern. Think about stemming, for example.

Do index functions always have to be marked "iscachable" ?

Btw, the error message was confusing. It said that the function has to be
marked "isImmutable", but there is no such attribute, instead "iscachable"
had to be used. This seems to be a bug.

Regards,
Bjoern


Re: tsearch comments

От
"Bjorn Metzdorf"
Дата:
> Do index functions always have to be marked "iscachable" ?
>
> Btw, the error message was confusing. It said that the function has to be
> marked "isImmutable", but there is no such attribute, instead "iscachable"
> had to be used. This seems to be a bug.

The manual says:

"All functions and operators used in an index definition must be immutable,
that is, their results must depend only on their input arguments and never
on any outside influence (such as the contents of another table or the
current time). This restriction ensures that the behavior of the index is
well-defined. To use a user-defined function in an index, remember to mark
the function immutable when you create it."

Well, in the tsearch case the results don't depend on any influence outside
of the function. The stemmer is integrated in the function and we don't use
the current time nor another table. So it should be safe to mark that
function "iscachable" or when it is fixed "isimmutable", am I right?

Regards,
Bjoern


Re: tsearch comments

От
Oleg Bartunov
Дата:
Bjorn,

you'll get  problem with your approach not to create additional columns
if index will not used ! In our approach sequential scan will use
txtidx column. I don't understand what error you're comment.
Probably I miss something. We are working on new version of tsearch
and alpha version will be available next week. Are you sure
'iscacheability' should be configureable somehow ? Teodor, what do
you think ?

    Oleg

On Thu, 23 Jan 2003, Bjorn Metzdorf wrote:

> > Do index functions always have to be marked "iscachable" ?
> >
> > Btw, the error message was confusing. It said that the function has to be
> > marked "isImmutable", but there is no such attribute, instead "iscachable"
> > had to be used. This seems to be a bug.
>
> The manual says:
>
> "All functions and operators used in an index definition must be immutable,
> that is, their results must depend only on their input arguments and never
> on any outside influence (such as the contents of another table or the
> current time). This restriction ensures that the behavior of the index is
> well-defined. To use a user-defined function in an index, remember to mark
> the function immutable when you create it."
>
> Well, in the tsearch case the results don't depend on any influence outside
> of the function. The stemmer is integrated in the function and we don't use
> the current time nor another table. So it should be safe to mark that
> function "iscachable" or when it is fixed "isimmutable", am I right?
>
> Regards,
> Bjoern
>

    Regards,
        Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83


Re: tsearch comments

От
Björn Metzdorf
Дата:
> txtidx column. I don't understand what error you're comment.

I think "iscachable" and "isimmutable" are mixed up. "CREATE FUNTION" does
not accept "isimmutable".

> Probably I miss something. We are working on new version of tsearch
> and alpha version will be available next week. Are you sure

Great news! Any comments on the new version? Snowball support? txt2txtidx
without morphology?

Regards,
Bjoern


Re: tsearch comments

От
Björn Metzdorf
Дата:
> > Yes, I don't see that problem either. The locale and the configuration
of
> > dictionaries do not change while running the postmaster, do they?
> I forgot parser too.
>
> Parser and dictionaries will be changed after gmake install
(contrib/tsearch)
> :), without restarting postmaster.
>
> In common case, we can't set iscachable option to txt2txtidx, but if you
know
> what you do you can set it. May be
> some comments must be included in README.

Ok, but then it should be sufficient to recreate the txt2txtidx function
and/or the index after a change of parser and dictionaries. So generally
spoken, the index function approach to tsearch works, if you take care of
that. I gladly recreate the function and/or index from time to time, if I
can do without that additional column, bloated dump and slow trigger.

This is very good news, as this is a very easy approach to having an easy to
use fulltextsearch in postgresql.

> > Great news! Any comments on the new version? Snowball support?
txt2txtidx
> > without morphology?
> Snowball, ISpell....
> SQL-level for configure parsers and dictionaries.
> SQL-level for configure map (which dictionary for each type of lexem).
> Just wait :)

Wow, sounds great!

Regards,
Bjoern


I was spoiled by the MySQL timestamp field

От
"Alan T. Miller"
Дата:
As someone who is just getting started with PostygreSQL from years working
with MySQL, it appears that the timestamp data type does not behave in the
way it did with MySQL. I got used to just defining a column as a timestamp
and letting the database throw the latest time stamp in there whenever a row
was updated. Is there anything simular in PosgreSQL? How can I accomplish
something simular inside the database, or am I stuck populating the field in
some manner as in the following example

update blah blah blah timestamp = NOW()

Thanks,

Alan


Re: I was spoiled by the MySQL timestamp field

От
Björn Metzdorf
Дата:
> As someone who is just getting started with PostygreSQL from years working
> with MySQL, it appears that the timestamp data type does not behave in the
> way it did with MySQL. I got used to just defining a column as a timestamp
> and letting the database throw the latest time stamp in there whenever a
row
> was updated. Is there anything simular in PosgreSQL? How can I accomplish
> something simular inside the database, or am I stuck populating the field
in
> some manner as in the following example

There is no such datatype in postgresql. If you just need the current time
inserted on INSERT and not on UPDATE, then you can declare that column with
"default now()". Else you will need to install a trigger, then you can have
exactly the same behaviour as with mysql.

Regards,
Bjoern


Re: tsearch comments

От
Björn Metzdorf
Дата:
> Ok, but then it should be sufficient to recreate the txt2txtidx function
> and/or the index after a change of parser and dictionaries. So generally
> spoken, the index function approach to tsearch works, if you take care of
> that. I gladly recreate the function and/or index from time to time, if I
> can do without that additional column, bloated dump and slow trigger.
>
> This is very good news, as this is a very easy approach to having an easy
to
> use fulltextsearch in postgresql.

I talked a bit more with Oleg and Teodor about this index function approach
and we came to the conclusion that it is safe to use (if you take care of
the above), but it might be a bit slower than the original column based
approach. That it because the used operators are defined with RECHECK, and
with the index function approach the RECHECK is against an (expensive)
function instead of "raw" data in a column.

Btw. the "iscachable" has another advantage, it seems to really speed up the
search. I have done tests again 250000 entries, mostly nicknames based on
fantasy (so the english stemmer has not much to do), and the first search
for a name takes about 0.5 - 2.5 seconds whereas all subsequent searches for
the same name (even with fresh inserted data) take less than 0.1 seconds. Or
does this have nothing to do with "iscachable"?

Regards,
Bjoern




Re: tsearch comments

От
Tom Lane
Дата:
"Bjorn Metzdorf" <bm@turtle-entertainment.de> writes:
> Btw, the error message was confusing. It said that the function has to be
> marked "isImmutable", but there is no such attribute, instead "iscachable"
> had to be used. This seems to be a bug.

Actually the preferred syntax in 7.3 is

    CREATE FUNCTION ... LANGUAGE foo IMMUTABLE

For awhile during 7.3 development you had to write WITH (isImmutable)
but we changed the syntax to be more SQL-spec-compatible.  This error
message seems not to have gotten fixed --- thanks for pointing it out.

            regards, tom lane

Re: I was spoiled by the MySQL timestamp field

От
dev@archonet.com
Дата:
> As someone who is just getting started with PostygreSQL from years working
> with MySQL, it appears that the timestamp data type does not behave in the
> way it did with MySQL.

Much as I like MySQL, it can sometimes be a little *too* helpful.

> I got used to just defining a column as a timestamp
> and letting the database throw the latest time stamp in there whenever a
> row
> was updated. Is there anything simular in PosgreSQL?

When you create the table do something like:

CREATE TABLE foo (
  bar timestamp DEFAULT now(),
  ...
);

You can of course do this with any column-type and value. See the
SQL-reference for details.

- Richard Huxton

Re: I was spoiled by the MySQL timestamp field

От
will trillich
Дата:
On Thu, Jan 23, 2003 at 04:26:31AM -0800, Alan T. Miller wrote:
> As someone who is just getting started with PostygreSQL from
> years working with MySQL, it appears that the timestamp data
> type does not behave in the way it did with MySQL. I got used
> to just defining a column as a timestamp and letting the
> database throw the latest time stamp in there whenever a row
> was updated. Is there anything simular in PosgreSQL? How can I
> accomplish something simular inside the database, or am I stuck
> populating the field in some manner as in the following example
>
> update blah blah blah timestamp = NOW()

triggers or rules can do that with any timestamp (or timestamp(0)
-- no partial-seconds) field. here's a "rules" approach:

    create table _something (
        id serial,
        dat text,
        freshened timestamp(0),
        primary key ( id )
    );

    create view something as
    select
        id,
        dat,
        freshened
    from
        _something;

    create rule something_add as
    on insert to something
    do instead (
        insert into _something (
            --id,
            dat,
            freshened
        ) values (
            --let id take care of itself,
            NEW.dat,
            current_timestamp
        );
    );

    create rule something_edit as
    on update to something
    do instead (
        update _something set
            --id = leave it alone,
            dat  = NEW.dat,
            freshened = current_timestamp
        where
            id   = NEW.id
        ;
    );

then you can just

    insert into something (dat) values ('yada yada');
    update something set dat = 'here we go' where id = 23978;

and "freshened" takes care of itself.

--
There are 10 kinds of people:
ones that get binary, and ones that don't.

will@serensoft.com
http://sourceforge.net/projects/newbiedoc -- we need your brain!
http://www.dontUthink.com/ -- your brain needs us!

Looking for a firewall? Do you think smoothwall sucks? You're
probably right... Try the folks at http://clarkconnect.org/ !

Re: I was spoiled by the MySQL timestamp field

От
Medi Montaseri
Дата:
Of course the proposed solution solves the INSERTs....you can also set a
trigger that
on UPDATE set the value to 'now()'

dev@archonet.com wrote:

>>As someone who is just getting started with PostygreSQL from years working
>>with MySQL, it appears that the timestamp data type does not behave in the
>>way it did with MySQL.
>>
>>
>
>Much as I like MySQL, it can sometimes be a little *too* helpful.
>
>
>
>>I got used to just defining a column as a timestamp
>>and letting the database throw the latest time stamp in there whenever a
>>row
>>was updated. Is there anything simular in PosgreSQL?
>>
>>
>
>When you create the table do something like:
>
>CREATE TABLE foo (
>  bar timestamp DEFAULT now(),
>  ...
>);
>
>You can of course do this with any column-type and value. See the
>SQL-reference for details.
>
>- Richard Huxton
>
>---------------------------(end of broadcast)---------------------------
>TIP 4: Don't 'kill -9' the postmaster
>
>




Re: tsearch comments

От
Teodor Sigaev
Дата:
 > Yes, I don't see that problem either. The locale and the configuration of
 > dictionaries do not change while running the postmaster, do they?
I forgot parser too.

Parser and dictionaries will be changed after gmake install (contrib/tsearch)
:), without restarting postmaster.

In common case, we can't set iscachable option to txt2txtidx, but if you know
what you do you can set it. May be
some comments must be included in README.


Björn Metzdorf wrote:
>>txtidx column. I don't understand what error you're comment.
>
>
> I think "iscachable" and "isimmutable" are mixed up. "CREATE FUNTION" does
> not accept "isimmutable".
>
>
>>Probably I miss something. We are working on new version of tsearch
>>and alpha version will be available next week. Are you sure
>
>
> Great news! Any comments on the new version? Snowball support? txt2txtidx
> without morphology?
Snowball, ISpell....
SQL-level for configure parsers and dictionaries.
SQL-level for configure map (which dictionary for each type of lexem).
Just wait :)

--
Teodor Sigaev
teodor@stack.net



Re: I was spoiled by the MySQL timestamp field

От
"Matthew Nuzum"
Дата:
One word of caution, I *think* you want quotes around the 'now()'
statement in your table definition. Otherwise your default value will be
the instant the *table* was created, not the instant your insert
happened.

I too had an adjustment period when switching from MySQL to postgres.
However, I think you'll find that if you use a RULE to implement this
feature you will soon become addicted to PostgreSQL's advanced feature
set.

You can learn more about rules by going to
http://www.ca.postgresql.org/docs/aw_pgsql_book/node124.html where you
can find a brief description and example, or for more detail, you can go
to
http://www.ca.postgresql.org/users-lounge/docs/7.2/postgres/rules-insert
.html

I attained all of my database knowledge through hands on experience;
much of it on MySQL.  Switching to PostgreSQL was somewhat traumatic
because it has a lot more power and a lot more features.  Once I got a
handle on things like Views, Triggers and Rules, I have become somewhat
dependant on them and I haven't been able to use MySQL for anything more
than the most basic of applications.  I guess that's just a warning...
There may be no turning back.

--
Matthew Nuzum
www.bearfruit.org
cobalt@bearfruit.org


> -----Original Message-----
> From: Björn Metzdorf [mailto:bm@turtle-entertainment.de]
> Sent: Thursday, January 23, 2003 7:32 AM
> To: Alan T. Miller; pgsql-general@postgresql.org
> Subject: Re: I was spoiled by the MySQL timestamp field
>
> > As someone who is just getting started with PostygreSQL from years
> working
> > with MySQL, it appears that the timestamp data type does not behave
in
> the
> > way it did with MySQL. I got used to just defining a column as a
> timestamp
> > and letting the database throw the latest time stamp in there
whenever a
> row
> > was updated. Is there anything simular in PosgreSQL? How can I
> accomplish
> > something simular inside the database, or am I stuck populating the
> field
> in
> > some manner as in the following example
>
> There is no such datatype in postgresql. If you just need the current
time
> inserted on INSERT and not on UPDATE, then you can declare that column
> with
> "default now()". Else you will need to install a trigger, then you can
> have
> exactly the same behaviour as with mysql.
>
> Regards,
> Bjoern



Re: I was spoiled by the MySQL timestamp field

От
"codeWarrior"
Дата:
postgreSQL actually has a better implementation of timestamps.... In
mySQL -- You can have only 1 timestamp field...

In postgreSQL -- you can have as many  "create_dt" timestamp default 'now()'
fields as you want and on an insert -- they ALL get stamped whereas in
mySQL --ONLY 1 COLUMN gets updated....



""Alan T. Miller"" <amiller@hollywood101.com> wrote in message
news:003a01c2c2da$a90d10a0$6e01a8c0@webdev...
> As someone who is just getting started with PostygreSQL from years working
> with MySQL, it appears that the timestamp data type does not behave in the
> way it did with MySQL. I got used to just defining a column as a timestamp
> and letting the database throw the latest time stamp in there whenever a
row
> was updated. Is there anything simular in PosgreSQL? How can I accomplish
> something simular inside the database, or am I stuck populating the field
in
> some manner as in the following example
>
> update blah blah blah timestamp = NOW()
>
> Thanks,
>
> Alan
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)



Re: I was spoiled by the MySQL timestamp field

От
"Luke Pascoe"
Дата:
Why not just use a DEFAULT NOW()?
ie:
    CREATE TABLE blah (
        time    TIMESTAMP NOT NULL DEFAULT NOW()
        );

I know it works for DATETIME types, don't know about TIMESTAMP but I assume
it would be the same.

Luke.

----- Original Message -----
From: "will trillich" <will@serensoft.com>
To: <pgsql-general@postgresql.org>
Sent: Friday, January 24, 2003 7:13 AM
Subject: Re: [GENERAL] I was spoiled by the MySQL timestamp field


> On Thu, Jan 23, 2003 at 04:26:31AM -0800, Alan T. Miller wrote:
> > As someone who is just getting started with PostygreSQL from
> > years working with MySQL, it appears that the timestamp data
> > type does not behave in the way it did with MySQL. I got used
> > to just defining a column as a timestamp and letting the
> > database throw the latest time stamp in there whenever a row
> > was updated. Is there anything simular in PosgreSQL? How can I
> > accomplish something simular inside the database, or am I stuck
> > populating the field in some manner as in the following example
> >
> > update blah blah blah timestamp = NOW()
>
> triggers or rules can do that with any timestamp (or timestamp(0)
> -- no partial-seconds) field. here's a "rules" approach:
>
> create table _something (
> id serial,
> dat text,
> freshened timestamp(0),
> primary key ( id )
> );
>
> create view something as
> select
> id,
> dat,
> freshened
> from
> _something;
>
> create rule something_add as
> on insert to something
> do instead (
> insert into _something (
> --id,
> dat,
> freshened
> ) values (
> --let id take care of itself,
> NEW.dat,
> current_timestamp
> );
> );
>
> create rule something_edit as
> on update to something
> do instead (
> update _something set
> --id = leave it alone,
> dat  = NEW.dat,
> freshened = current_timestamp
> where
> id   = NEW.id
> ;
> );
>
> then you can just
>
> insert into something (dat) values ('yada yada');
> update something set dat = 'here we go' where id = 23978;
>
> and "freshened" takes care of itself.
>
> --
> There are 10 kinds of people:
> ones that get binary, and ones that don't.
>
> will@serensoft.com
> http://sourceforge.net/projects/newbiedoc -- we need your brain!
> http://www.dontUthink.com/ -- your brain needs us!
>
> Looking for a firewall? Do you think smoothwall sucks? You're
> probably right... Try the folks at http://clarkconnect.org/ !
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/users-lounge/docs/faq.html
>
>



Re: I was spoiled by the MySQL timestamp field

От
Joseph Shraibman
Дата:
See http://www.brasileiro.net:8080/postgres/cookbook/view-one-recipe.adp?recipe_id=10

Matthew Nuzum wrote:
> One word of caution, I *think* you want quotes around the 'now()'
> statement in your table definition. Otherwise your default value will be
> the instant the *table* was created, not the instant your insert
> happened.
>
> I too had an adjustment period when switching from MySQL to postgres.
> However, I think you'll find that if you use a RULE to implement this
> feature you will soon become addicted to PostgreSQL's advanced feature
> set.
>
> You can learn more about rules by going to
> http://www.ca.postgresql.org/docs/aw_pgsql_book/node124.html where you
> can find a brief description and example, or for more detail, you can go
> to
> http://www.ca.postgresql.org/users-lounge/docs/7.2/postgres/rules-insert
> .html
>
> I attained all of my database knowledge through hands on experience;
> much of it on MySQL.  Switching to PostgreSQL was somewhat traumatic
> because it has a lot more power and a lot more features.  Once I got a
> handle on things like Views, Triggers and Rules, I have become somewhat
> dependant on them and I haven't been able to use MySQL for anything more
> than the most basic of applications.  I guess that's just a warning...
> There may be no turning back.
>
> --
> Matthew Nuzum
> www.bearfruit.org
> cobalt@bearfruit.org
> =20
>
>
>>-----Original Message-----
>>From: Bj=F6rn Metzdorf [mailto:bm@turtle-entertainment.de]
>>Sent: Thursday, January 23, 2003 7:32 AM
>>To: Alan T. Miller; pgsql-general@postgresql.org
>>Subject: Re: I was spoiled by the MySQL timestamp field
>>=20
>>
>>>As someone who is just getting started with PostygreSQL from years
>>
>>working
>>
>>>with MySQL, it appears that the timestamp data type does not behave
>
> in
>
>>the
>>
>>>way it did with MySQL. I got used to just defining a column as a
>>
>>timestamp
>>
>>>and letting the database throw the latest time stamp in there
>
> whenever a
>
>>row
>>
>>>was updated. Is there anything simular in PosgreSQL? How can I
>>
>>accomplish
>>
>>>something simular inside the database, or am I stuck populating the
>>
>>field
>>in
>>
>>>some manner as in the following example
>>
>>=20
>>There is no such datatype in postgresql. If you just need the current
>
> time
>
>>inserted on INSERT and not on UPDATE, then you can declare that column
>>with
>>"default now()". Else you will need to install a trigger, then you can
>>have
>>exactly the same behaviour as with mysql.
>>=20
>>Regards,
>>Bjoern
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org


Re: I was spoiled by the MySQL timestamp field

От
will trillich
Дата:
On Fri, Jan 24, 2003 at 09:57:04AM +1300, Luke Pascoe wrote:
> Why not just use a DEFAULT NOW()?
> ie:
>     CREATE TABLE blah (
>         time    TIMESTAMP NOT NULL DEFAULT NOW()
>         );
>
> I know it works for DATETIME types, don't know about TIMESTAMP
> but I assume it would be the same.

great for INSERT, but rather inert on UPDATE.

create table _gribnif (
    created  timestamp default current_timestamp,
    modified timestamp,
    other    text,
);

create view gribnif ...

create rule gribnif_edit as
on update to gribnif
do instead (
    update _gribnif set
        MODIFIED = CURRENT_TIMESTAMP,
        other    = NEW.other
    where
        id = NEW.id
    ;
);

--
There are 10 kinds of people:
ones that get binary, and ones that don't.

will@serensoft.com
http://sourceforge.net/projects/newbiedoc -- we need your brain!
http://www.dontUthink.com/ -- your brain needs us!

Looking for a firewall? Do you think smoothwall sucks? You're
probably right... Try the folks at http://clarkconnect.org/ !

Re: I was spoiled by the MySQL timestamp field

От
Lincoln Yeoh
Дата:
At 03:02 PM 1/23/03 -0500, Matthew Nuzum wrote:

>One word of caution, I *think* you want quotes around the 'now()'
>statement in your table definition. Otherwise your default value will be
>the instant the *table* was created, not the instant your insert
>happened.

Would plain: default 'now' work? If it does (it seems to), what is the
difference?

Link.


Re: I was spoiled by the MySQL timestamp field

От
Dennis Gearon
Дата:
What's the difference between 'now()' and 'current_timestamp' ?

will trillich wrote:
>
> On Fri, Jan 24, 2003 at 09:57:04AM +1300, Luke Pascoe wrote:
> > Why not just use a DEFAULT NOW()?
> > ie:
> >     CREATE TABLE blah (
> >         time    TIMESTAMP NOT NULL DEFAULT NOW()
> >         );
> >
> > I know it works for DATETIME types, don't know about TIMESTAMP
> > but I assume it would be the same.
>
> great for INSERT, but rather inert on UPDATE.
>
> create table _gribnif (
>         created  timestamp default current_timestamp,
>         modified timestamp,
>         other    text,
> );
>
> create view gribnif ...
>
> create rule gribnif_edit as
> on update to gribnif
> do instead (
>         update _gribnif set
>                 MODIFIED = CURRENT_TIMESTAMP,
>                 other    = NEW.other
>         where
>                 id = NEW.id
>         ;
> );
>
> --
> There are 10 kinds of people:
> ones that get binary, and ones that don't.
>
> will@serensoft.com
> http://sourceforge.net/projects/newbiedoc -- we need your brain!
> http://www.dontUthink.com/ -- your brain needs us!
>
> Looking for a firewall? Do you think smoothwall sucks? You're
> probably right... Try the folks at http://clarkconnect.org/ !
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly

--

Carpe Dancem ;-)
-----------------------------------------------------------------
Remember your friends while they are alive
-----------------------------------------------------------------
                         Sincerely, Dennis Gearon

Re: I was spoiled by the MySQL timestamp field

От
Tom Lane
Дата:
Dennis Gearon <gearond@cvc.net> writes:
> What's the difference between 'now()' and 'current_timestamp' ?

None AFAIK.  now() is historical Postgres usage, CURRENT_TIMESTAMP is
the SQL-spec-mandated spelling of the same functionality.

CURRENT_TIMESTAMP with an argument does something a bit different,
though.

http://www.ca.postgresql.org/users-lounge/docs/7.3/postgres/functions-datetime.html#FUNCTIONS-DATETIME-CURRENT

            regards, tom lane

Re: I was spoiled by the MySQL timestamp field

От
will trillich
Дата:
On Fri, Jan 24, 2003 at 10:55:26AM -0500, Tom Lane wrote:
> Dennis Gearon <gearond@cvc.net> writes:
> > What's the difference between 'now()' and 'current_timestamp' ?
>
> None AFAIK.  now() is historical Postgres usage, CURRENT_TIMESTAMP is
> the SQL-spec-mandated spelling of the same functionality.
>
> CURRENT_TIMESTAMP with an argument does something a bit different,
> though.
>
> http://www.ca.postgresql.org/users-lounge/docs/7.3/postgres/functions-datetime.html#FUNCTIONS-DATETIME-CURRENT

the one caveat is outlined at the bottom of that very page:

    All the date/time data types also accept the special literal
    value now to specify the current date and time. Thus, the
    following three all return the same result:

        SELECT CURRENT_TIMESTAMP;
        SELECT now();
        SELECT TIMESTAMP 'now';

    Note: You do not want to use the third form when specifying
    a DEFAULT clause while creating a table. The system will
    convert now to a timestamp as soon as the constant is
    parsed, so that when the default value is needed, the ##TIME
    OF THE TABLE CREATION## would be used! The first two forms
    will not be evaluated until the default value is used,
    because they are function calls. Thus they will give the
    desired behavior of defaulting to the time of row insertion.

[emphasis mine]

so careful using "now" when you mean to use "now()".
or better yet, "current_timestamp" for consistency.

--
There are 10 kinds of people:
ones that get binary, and ones that don't.

will@serensoft.com
http://sourceforge.net/projects/newbiedoc -- we need your brain!
http://www.dontUthink.com/ -- your brain needs us!

Looking for a firewall? Do you think smoothwall sucks? You're
probably right... Try the folks at http://clarkconnect.org/ !

Re: I was spoiled by the MySQL timestamp field

От
"Matthew Nuzum"
Дата:
The difference is when now() gets interpreted into a date.  Someone
please correct me if I'm wrong...

If you create a table at 15:00 Jan 26, 2003 and you use an unquoted
now() as the default value for a field, then each record will have it's
default value as 15:00 Jan 26, 2003.

If you quote the now(), then the default value for each newly created
record will be now() which is interpreted at the moment the record is
created.

I've used different rdbms so please forgive me if I'm documenting the
characteristics of a different system.

--
Matthew Nuzum
www.bearfruit.org
cobalt@bearfruit.org


> -----Original Message-----
> From: Lincoln Yeoh [mailto:lyeoh@pop.jaring.my]
> Sent: Friday, January 24, 2003 4:47 AM
> To: Matthew Nuzum; 'Björn Metzdorf'; 'Alan T. Miller'; pgsql-
> general@postgresql.org
> Subject: Re: [GENERAL] I was spoiled by the MySQL timestamp field
>
> At 03:02 PM 1/23/03 -0500, Matthew Nuzum wrote:
>
> >One word of caution, I *think* you want quotes around the 'now()'
> >statement in your table definition. Otherwise your default value will
be
> >the instant the *table* was created, not the instant your insert
> >happened.
>
> Would plain: default 'now' work? If it does (it seems to), what is the
> difference?
>
> Link.


Re: I was spoiled by the MySQL timestamp field

От
Tom Lane
Дата:
"Matthew Nuzum" <cobalt@bearfruit.org> writes:
> The difference is when now() gets interpreted into a date.  Someone
> please correct me if I'm wrong...

> If you create a table at 15:00 Jan 26, 2003 and you use an unquoted
> now() as the default value for a field, then each record will have it's
> default value as 15:00 Jan 26, 2003.

You're wrong...

now() is a function call and will not be folded into a constant.

AFAIR, the only case that does get folded to a constant is

    ... mycol timestamp default timestamp 'now'

since "timestamp 'now'" is Postgres' notation for a literal constant of
a specific datatype (timestamp in this case).  The string 'now' is
immediately fed to the timestamp datatype's input converter, and behold
it produces the current time, which is then stored as a timestamp
constant.

The notation 'now()' that various people have suggested is in fact
invalid, being not a function call but a literal --- but it seems that
the timestamp input converter is sloppy about detecting trailing garbage
in its input string.  You should get a "Bad timestamp external
representation" error from it, but at the moment you don't.

You can easily check the behavior for yourself rather than relying on
other people's assertions.  For example:

regression=# create table t1 (f1 timestamp default now(),
regression(# f2 timestamp default 'now',
regression(# f3 timestamp default timestamp 'now');
CREATE TABLE
regression=# \d t1
                                            Table "public.t1"
 Column |            Type             |                             Modifiers

--------+-----------------------------+-------------------------------------------------------------------
 f1     | timestamp without time zone | default now()
 f2     | timestamp without time zone | default 'now'
 f3     | timestamp without time zone | default '2003-01-26 16:51:11.246954'::timestamp without time zone

and even more directly:

regression=# insert into t1 default values;
INSERT 1003028 1
regression=# insert into t1 default values;
INSERT 1003029 1
regression=# select * from t1;
             f1             |             f2             |             f3
----------------------------+----------------------------+----------------------------
 2003-01-26 16:58:13.173561 | 2003-01-26 16:58:13.173561 | 2003-01-26 16:51:11.246954
 2003-01-26 16:58:14.323162 | 2003-01-26 16:58:14.323162 | 2003-01-26 16:51:11.246954
(2 rows)


(BTW, the reason 'now' without "timestamp" in front works is that this
is not a timestamp literal but a text literal, which will be coerced
to timestamp at runtime.)

            regards, tom lane

Re: I was spoiled by the MySQL timestamp field

От
Lincoln Yeoh
Дата:
At 05:00 PM 1/26/03 -0500, Tom Lane wrote:
>(BTW, the reason 'now' without "timestamp" in front works is that this
>is not a timestamp literal but a text literal, which will be coerced
>to timestamp at runtime.)

Is it known at the moment which of those methods the Postgresql team are
aiming to continue supporting for the near/medium future?

e.g. current_timestamp is guaranteed. now() for the forseeable future.
'now' for the next few versions.

Thanks,
Link.



Re: I was spoiled by the MySQL timestamp field

От
Tom Lane
Дата:
Lincoln Yeoh <lyeoh@pop.jaring.my> writes:
> At 05:00 PM 1/26/03 -0500, Tom Lane wrote:
>> (BTW, the reason 'now' without "timestamp" in front works is that this
>> is not a timestamp literal but a text literal, which will be coerced
>> to timestamp at runtime.)

> Is it known at the moment which of those methods the Postgresql team are
> aiming to continue supporting for the near/medium future?

AFAIK there are no plans to break any of them; though certainly
CURRENT_TIMESTAMP is the most future-proof, being spec-mandated.

            regards, tom lane

Re: I was spoiled by the MySQL timestamp field

От
will trillich
Дата:
On Sun, Jan 26, 2003 at 05:00:36PM -0500, Tom Lane wrote:
> now() is a function call and will not be folded into a constant.

> You can easily check the behavior for yourself rather than relying on
> other people's assertions.

note -- this is ALWAYS a good idea. especially when it's *me*
giving the assertion :)

reminds me of a high-school prof i had. he'd stop in the middle
of a lesson and ask if anybody knew the meaning of a word he'd
just used ("mote", for example); if not, he'd supply a
definition (small fly) and go on.

later we'd have a vocabulary test; any answers that parroted
what he'd said were always marked wrong, as he'd just made them
up off-the-cuff. (mote is a particle, as in gritty dust.)

double-check for yourself. don't take anybody's word for it.
KNOW, don't trust.

the best way to find out, is to FIND OUT.

--
There are 10 kinds of people:
ones that get binary, and ones that don't.

will@serensoft.com
http://sourceforge.net/projects/newbiedoc -- we need your brain!
http://www.dontUthink.com/ -- your brain needs us!

Looking for a firewall? Do you think smoothwall sucks? You're
probably right... Try the folks at http://clarkconnect.org/ !

Re: I was spoiled by the MySQL timestamp field

От
Lincoln Yeoh
Дата:
At 11:59 PM 1/26/03 -0600, will trillich wrote:

>On Sun, Jan 26, 2003 at 05:00:36PM -0500, Tom Lane wrote:
> > now() is a function call and will not be folded into a constant.
>
> > You can easily check the behavior for yourself rather than relying on
> > other people's assertions.
>
>note -- this is ALWAYS a good idea. especially when it's *me*
>giving the assertion :)

It's still good hear from the developers what they think the behaviour
should be, and would be.

If the devs, docs and software agree then it'll be fine to use the feature.
Otherwise if possible, it might be a better idea to use a different feature
to achieve what I want. Don't want to use a behaviour will soon be
deprecated/changed.

Of course, it's not good to pester you guys for everything either. But
current_timestamp vs now() vs 'now' would probably be a popular feature
enough to clarify.

Hope this doesn't waste too much bandwidth ;).
Link.


Re: I was spoiled by the MySQL timestamp field

От
Bruce Momjian
Дата:
FYI, I usually prefer triggers in cases where you want to modify/check
the row as it is being processed, and rules for modifying other
rows/tables as part of row processing, but both do work.

---------------------------------------------------------------------------

will trillich wrote:
> On Thu, Jan 23, 2003 at 04:26:31AM -0800, Alan T. Miller wrote:
> > As someone who is just getting started with PostygreSQL from
> > years working with MySQL, it appears that the timestamp data
> > type does not behave in the way it did with MySQL. I got used
> > to just defining a column as a timestamp and letting the
> > database throw the latest time stamp in there whenever a row
> > was updated. Is there anything simular in PosgreSQL? How can I
> > accomplish something simular inside the database, or am I stuck
> > populating the field in some manner as in the following example
> >
> > update blah blah blah timestamp = NOW()
>
> triggers or rules can do that with any timestamp (or timestamp(0)
> -- no partial-seconds) field. here's a "rules" approach:
>
>     create table _something (
>         id serial,
>         dat text,
>         freshened timestamp(0),
>         primary key ( id )
>     );
>
>     create view something as
>     select
>         id,
>         dat,
>         freshened
>     from
>         _something;
>
>     create rule something_add as
>     on insert to something
>     do instead (
>         insert into _something (
>             --id,
>             dat,
>             freshened
>         ) values (
>             --let id take care of itself,
>             NEW.dat,
>             current_timestamp
>         );
>     );
>
>     create rule something_edit as
>     on update to something
>     do instead (
>         update _something set
>             --id = leave it alone,
>             dat  = NEW.dat,
>             freshened = current_timestamp
>         where
>             id   = NEW.id
>         ;
>     );
>
> then you can just
>
>     insert into something (dat) values ('yada yada');
>     update something set dat = 'here we go' where id = 23978;
>
> and "freshened" takes care of itself.
>
> --
> There are 10 kinds of people:
> ones that get binary, and ones that don't.
>
> will@serensoft.com
> http://sourceforge.net/projects/newbiedoc -- we need your brain!
> http://www.dontUthink.com/ -- your brain needs us!
>
> Looking for a firewall? Do you think smoothwall sucks? You're
> probably right... Try the folks at http://clarkconnect.org/ !
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/users-lounge/docs/faq.html
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: I was spoiled by the MySQL timestamp field

От
Andrew Sullivan
Дата:
On Sun, Jan 26, 2003 at 05:00:36PM -0500, Tom Lane wrote:
> "Matthew Nuzum" <cobalt@bearfruit.org> writes:
> > The difference is when now() gets interpreted into a date.  Someone
> > please correct me if I'm wrong...

> now() is a function call and will not be folded into a constant.

I know it's not really related, but I think it might be worth noting
here, also, that now() is (was?  I don't have anything later than 7.2
handy for testing at the moment) resolves to the beginning of the
transaction.  If you need time to move forward during your
transaction, you need timeofday().

A


--
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110


Re: tsearch comments

От
sector119@mail.ru
Дата:
HI

will we see sort by relevance at tsearch alpha version? :)

--
WBR, sector119

Вложения

Re: tsearch comments

От
Oleg Bartunov
Дата:
On Tue, 28 Jan 2003 sector119@mail.ru wrote:

> HI
>
> will we see sort by relevance at tsearch alpha version? :)
>

not sure. We concentrate our efforts, well, Teodor is working on
better configurability of tsearch like OpenFTS does.

It's not difficult to add rather naive relevance based on position
of lexem in document, for example. The question is do you like such
kind of relevancy ? Real ranking support (as in OpenFTS) require
separate tables to maintain coordinate information.
We want to keep tsearch as simple as it's and now we just add
better and friendly configurability. Do we need complicate tsearch ?
We already have OpenFTS which has most features people requested.


    Regards,
        Oleg


>

    Regards,
        Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83


Re: tsearch comments

От
Uros Gruber
Дата:
Hi!

I think that this would be nice. OpenFTS is great, but i would
be great if this would be implement in real pg functions.

I think that indexim would be great if pg make it by itself.

Also it could be great if we could define order of weight of
columns.

bye Uros

I
On 28.01.2003 at 11:53:26, Oleg Bartunov <oleg@sai.msu.su>
wrote:

> On Tue, 28 Jan 2003 sector119@mail.ru wrote:
>
> > HI
> >
> > will we see sort by relevance at tsearch alpha version? :)
> >
>
> not sure. We concentrate our efforts, well, Teodor is working
> on
> better configurability of tsearch like OpenFTS does.
>
> It\\\'s not difficult to add rather naive relevance based on
> position
> of lexem in document, for example. The question is do you
like
> such
> kind of relevancy ? Real ranking support (as in OpenFTS)
> require
> separate tables to maintain coordinate information.
> We want to keep tsearch as simple as it\\\'s and now we just
add
> better and friendly configurability. Do we need complicate
> tsearch ?
> We already have OpenFTS which has most features people
> requested.
>


Re: tsearch comments

От
Tomaz Borstnar
Дата:
At 13:47 28.1.2003 +0300, Oleg Bartunov wrote the following message:
>We want to keep tsearch as simple as it's and now we just add
>better and friendly configurability. Do we need complicate tsearch ?

Sometimes you need that because some other app is putting data into database.

>We already have OpenFTS which has most features people requested.

But integration into other stuff could use some work.




Re: tsearch comments

От
Oleg Bartunov
Дата:
On Tue, 28 Jan 2003, Uros Gruber wrote:

> Hi!
>
> I think that this would be nice. OpenFTS is great, but i would
> be great if this would be implement in real pg functions.
>
> I think that indexim would be great if pg make it by itself.
>
> Also it could be great if we could define order of weight of
> columns.

Could you elaborate this ?

>
> bye Uros
>
> I
> On 28.01.2003 at 11:53:26, Oleg Bartunov <oleg@sai.msu.su>
> wrote:
>
> > On Tue, 28 Jan 2003 sector119@mail.ru wrote:
> >
> > > HI
> > >
> > > will we see sort by relevance at tsearch alpha version? :)
> > >
> >
> > not sure. We concentrate our efforts, well, Teodor is working
> > on
> > better configurability of tsearch like OpenFTS does.
> >
> > It\\\'s not difficult to add rather naive relevance based on
> > position
> > of lexem in document, for example. The question is do you
> like
> > such
> > kind of relevancy ? Real ranking support (as in OpenFTS)
> > require
> > separate tables to maintain coordinate information.
> > We want to keep tsearch as simple as it\\\'s and now we just
> add
> > better and friendly configurability. Do we need complicate
> > tsearch ?
> > We already have OpenFTS which has most features people
> > requested.
> >
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
>

    Regards,
        Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83


Re: tsearch comments

От
Oleg Bartunov
Дата:
On Tue, 28 Jan 2003, Tomaz Borstnar wrote:

> At 13:47 28.1.2003 +0300, Oleg Bartunov wrote the following message:
> >We want to keep tsearch as simple as it's and now we just add
> >better and friendly configurability. Do we need complicate tsearch ?
>
> Sometimes you need that because some other app is putting data into database.
>

So, you'll end up with something like OpenFTS, which was designed as
*engine* to be integrated into  other apps. The real problem is that
OpenFTS is written in perl and porting to  other languages is
difficult task. new tsearch already has some features of OpenFTS and
we're slowly moving to idea we should rewrite OpenFTS in 'C',
so writing interfaces would be much simpler.
There is major problem with moving ALL features of OpenFTS to tsearch
we don't know how to resolve - generation of headlines, text fragments
with hilighted query terms. Once we resolve that we could concentrate
on tsearch with ranking support.

> >We already have OpenFTS which has most features people requested.
>
> But integration into other stuff could use some work.
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
>

    Regards,
        Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83


Re: tsearch comments

От
Uros Gruber
Дата:
Hi!

OpenFTS is great so far. But for example. We are working on
directory engine and we would like to use some ranking on
data we get from tsearch. The we have data like Page caption,
description, keywords, url, page content .... and then we
have another project we we search on complitely different
kind of data.

Using full text search in this scenario is very easy to use,
because everything is in db and this is done on db level.
Developer do not need to worry about that how to index
something. It great because you can say this column is
fulltext indexed.

Second stage is ordering data you get from tsearch and thats
where openFTS comes. But you have to make some middle ware
which is great, but we need to focus on other problems not on
middle ware.

Moving this to C would be great but not solution to all of us
we want to meka our searches good.

I think relkov and relor is good for start and should be
going that way. I think that everybody can very simple
acomplish hilightning and generation of headlines once they
get result ordered.

As i say in my mail before and Oleg ask me "Could you
elaborate this ?". I try to make some changes openFTS special
in relkov and relkor. But i'm not god in advanced C
programing so i spend a lot of time to find out what exactly
code does.

And here is my idea what would be great if this is possible
to make, because i don't realy know how pg internaly works.

Let say we create some table where we want to use full text
search.

CREATE table .....
..
mycolumn varchar,
another_column varchar,
....
fulltext(mycolumn,another_column)

}
the system then make all necessary index tables where those
positions would be saved when some data is inserted. I don't
know if this is possible to make somwehere in backgound so
user don actualy se those tables, but this is not a problem.

Parsing search words can anybody easily make in their own
language. Or he could use OpenFTS functionality. I made it
for PHP. So when you have those search words we passed it to
sql query.

something like this.

SELECT mycolumn,another_column FROM mytable WHERE mycolumn @
'search string' AND another_column @ 'search string';

This is done by tsearch and we get data searched but not
orderd by relevance.

For that we add something in that way

SELECT mycolumn,another_column,rank() AS sumofrank FROM mytable WHERE....... ORDER my
sumofrank

I'll write this rank here for better understanding

rank({mycolumn=>0.01},{another_column=>0.001},'search string') AS sumofrank

This would read that mycolumn have base weight 0.01 and
another column 0.001, so if search string is found in beginig
of another column it would be ranked lower than same string
found in mycolumn in the middle of it. Those weight could be
summed. With this could be possible to make order what column
is more important not only generaly but for every query we
make.

Sintax is just for easier understanding what i'm trying to
solve.

So far we orderd aout data and then we could make hilighning
and stuff in any language we want.

I hope everybody undestands what is my idea and i would like
to help i just have to learn more from the code and what
internaly is done with that data.

I make some ranking in PHP but it was not fast becase there
were a lot of data etc and php is not as fast as C is. But i
get pretty results and also the concept how to rank
something.

I could also be made some rule engine how to rank something,
but i think that first of all we have to start on something
trivial and simple. And when this works we move to advanced.
Let say we check if text is bold or is in CAPS...

--
bye,
 Uros

Tuesday, January 28, 2003, 8:11:36 PM, you wrote:

OB> On Tue, 28 Jan 2003, Tomaz Borstnar wrote:

>> At 13:47 28.1.2003 +0300, Oleg Bartunov wrote the following message:
>> >We want to keep tsearch as simple as it's and now we just add
>> >better and friendly configurability. Do we need complicate tsearch ?
>>
>> Sometimes you need that because some other app is putting data into database.
>>

OB> So, you'll end up with something like OpenFTS, which was designed as
OB> *engine* to be integrated into  other apps. The real problem is that
OB> OpenFTS is written in perl and porting to  other languages is
OB> difficult task. new tsearch already has some features of OpenFTS and
OB> we're slowly moving to idea we should rewrite OpenFTS in 'C',
OB> so writing interfaces would be much simpler.
OB> There is major problem with moving ALL features of OpenFTS to tsearch
OB> we don't know how to resolve - generation of headlines, text fragments
OB> with hilighted query terms. Once we resolve that we could concentrate
OB> on tsearch with ranking support.


Re: tsearch comments

От
Uroš Gruber
Дата:
Hi!

I think that this would be nice. OpenFTS is great, but i would
be great if this would be implement in real pg functions.

I think that indexim would be great if pg make it by itself.

Also it could be great if we could define order of weight of
columns.

bye Uros

I
On 28.01.2003 at 11:53:26, Oleg Bartunov <oleg@sai.msu.su>
wrote:

> On Tue, 28 Jan 2003 sector119@mail.ru wrote:
>
> > HI
> >
> > will we see sort by relevance at tsearch alpha version? :)
> >
>
> not sure. We concentrate our efforts, well, Teodor is working
> on
> better configurability of tsearch like OpenFTS does.
>
> It\\\'s not difficult to add rather naive relevance based on
> position
> of lexem in document, for example. The question is do you
like
> such
> kind of relevancy ? Real ranking support (as in OpenFTS)
> require
> separate tables to maintain coordinate information.
> We want to keep tsearch as simple as it\\\'s and now we just
add
> better and friendly configurability. Do we need complicate
> tsearch ?
> We already have OpenFTS which has most features people
> requested.
>
>
>     Regards,
>         Oleg
>
>
> >
>
>     Regards,
>         Oleg
> _____________________________________________________________
> Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
> Sternberg Astronomical Institute, Moscow University (Russia)
> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
> phone: +007(095)939-16-83, +007(095)939-23-83
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to
> majordomo@postgresql.org
>


--
Any excuse will serve a tyrant.
        -- Aesop


Re: tsearch comments

От
"eric@did-it.com"
Дата:
Hi,

I guess what we're looking for is something on the order (as much as I
hate using it as a reference) of MySQL's full text search which does
offer some ranking.

Just putting ranking alone in tsearch would be a huge benefit. Users can
then decide in their own language how to display results, especially
since those results may not necessarily require titles or description
fragments.

For example, we have several huge tables that have the following
columns:

> id
> tbltype
> title
> description

Basically, our customer will lookup words that are contained in title
and description, so we make an additional table like:

> id
> tblid (id of the source table)
> tblsource (which table)
> content (txtidx)

Then we can use tsearch to search the second table (we do now), and once
we retrieve the id's that we want, we can display results from one or
more source tables. Just putting in ranking in tsearch would solve all
these problems.

- Ericson Smith
http://www.did-it.com
http://www.weightlossfriends.com


On Tue, 2003-01-28 at 14:00, Oleg Bartunov wrote:
> On Tue, 28 Jan 2003, Uros Gruber wrote:
>
> > Hi!
> >
> > I think that this would be nice. OpenFTS is great, but i would
> > be great if this would be implement in real pg functions.
> >
> > I think that indexim would be great if pg make it by itself.
> >
> > Also it could be great if we could define order of weight of
> > columns.
>
> Could you elaborate this ?
>
> >
> > bye Uros
> >
> > I
> > On 28.01.2003 at 11:53:26, Oleg Bartunov <oleg@sai.msu.su>
> > wrote:
> >
> > > On Tue, 28 Jan 2003 sector119@mail.ru wrote:
> > >
> > > > HI
> > > >
> > > > will we see sort by relevance at tsearch alpha version? :)
> > > >
> > >
> > > not sure. We concentrate our efforts, well, Teodor is working
> > > on
> > > better configurability of tsearch like OpenFTS does.
> > >
> > > It\\\'s not difficult to add rather naive relevance based on
> > > position
> > > of lexem in document, for example. The question is do you
> > like
> > > such
> > > kind of relevancy ? Real ranking support (as in OpenFTS)
> > > require
> > > separate tables to maintain coordinate information.
> > > We want to keep tsearch as simple as it\\\'s and now we just
> > add
> > > better and friendly configurability. Do we need complicate
> > > tsearch ?
> > > We already have OpenFTS which has most features people
> > > requested.
> > >
> >
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
> >
>
>     Regards,
>         Oleg
> _____________________________________________________________
> Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
> Sternberg Astronomical Institute, Moscow University (Russia)
> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
> phone: +007(095)939-16-83, +007(095)939-23-83
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly



Re: tsearch comments

От
"eric@did-it.com"
Дата:
Good ideas,

I second them entirely.

We're using tsearch as well, and would like to see ranking in this
module. It would be great to have it on the DB level, instead of
middleware (perl, etc). Apparently there is built in proximity ranking?

I guess one of the problems with using the OpenFTS front end is
integration into existing systems that use different languages and
methodologies. Most of the time, you tend to want those results
delivered within your application, instead of outside of it in a general
purpose search engine.

Also, in many typical applications, the data is not necessarily
title/description based, for instance, we need to lookup data from just
a title, or sometimes a content column, and maybe even from our keywords
table.

- Ericson Smith
http://www.weightlossfriends.com


On Tue, 2003-01-28 at 17:48, Uros Gruber wrote:
> Hi!
>
> OpenFTS is great so far. But for example. We are working on
> directory engine and we would like to use some ranking on
> data we get from tsearch. The we have data like Page caption,
> description, keywords, url, page content .... and then we
> have another project we we search on complitely different
> kind of data.
>
> Using full text search in this scenario is very easy to use,
> because everything is in db and this is done on db level.
> Developer do not need to worry about that how to index
> something. It great because you can say this column is
> fulltext indexed.
>
> Second stage is ordering data you get from tsearch and thats
> where openFTS comes. But you have to make some middle ware
> which is great, but we need to focus on other problems not on
> middle ware.
>
> Moving this to C would be great but not solution to all of us
> we want to meka our searches good.
>
> I think relkov and relor is good for start and should be
> going that way. I think that everybody can very simple
> acomplish hilightning and generation of headlines once they
> get result ordered.
>
> As i say in my mail before and Oleg ask me "Could you
> elaborate this ?". I try to make some changes openFTS special
> in relkov and relkor. But i'm not god in advanced C
> programing so i spend a lot of time to find out what exactly
> code does.
>
> And here is my idea what would be great if this is possible
> to make, because i don't realy know how pg internaly works.
>
> Let say we create some table where we want to use full text
> search.
>
> CREATE table .....
> ..
> mycolumn varchar,
> another_column varchar,
> ....
> fulltext(mycolumn,another_column)
>
> }
> the system then make all necessary index tables where those
> positions would be saved when some data is inserted. I don't
> know if this is possible to make somwehere in backgound so
> user don actualy se those tables, but this is not a problem.
>
> Parsing search words can anybody easily make in their own
> language. Or he could use OpenFTS functionality. I made it
> for PHP. So when you have those search words we passed it to
> sql query.
>
> something like this.
>
> SELECT mycolumn,another_column FROM mytable WHERE mycolumn @
> 'search string' AND another_column @ 'search string';
>
> This is done by tsearch and we get data searched but not
> orderd by relevance.
>
> For that we add something in that way
>
> SELECT mycolumn,another_column,rank() AS sumofrank FROM mytable WHERE....... ORDER my
> sumofrank
>
> I'll write this rank here for better understanding
>
> rank({mycolumn=>0.01},{another_column=>0.001},'search string') AS sumofrank
>
> This would read that mycolumn have base weight 0.01 and
> another column 0.001, so if search string is found in beginig
> of another column it would be ranked lower than same string
> found in mycolumn in the middle of it. Those weight could be
> summed. With this could be possible to make order what column
> is more important not only generaly but for every query we
> make.
>
> Sintax is just for easier understanding what i'm trying to
> solve.
>
> So far we orderd aout data and then we could make hilighning
> and stuff in any language we want.
>
> I hope everybody undestands what is my idea and i would like
> to help i just have to learn more from the code and what
> internaly is done with that data.
>
> I make some ranking in PHP but it was not fast becase there
> were a lot of data etc and php is not as fast as C is. But i
> get pretty results and also the concept how to rank
> something.
>
> I could also be made some rule engine how to rank something,
> but i think that first of all we have to start on something
> trivial and simple. And when this works we move to advanced.
> Let say we check if text is bold or is in CAPS...
>
> --
> bye,
>  Uros
>
> Tuesday, January 28, 2003, 8:11:36 PM, you wrote:
>
> OB> On Tue, 28 Jan 2003, Tomaz Borstnar wrote:
>
> >> At 13:47 28.1.2003 +0300, Oleg Bartunov wrote the following message:
> >> >We want to keep tsearch as simple as it's and now we just add
> >> >better and friendly configurability. Do we need complicate tsearch ?
> >>
> >> Sometimes you need that because some other app is putting data into database.
> >>
>
> OB> So, you'll end up with something like OpenFTS, which was designed as
> OB> *engine* to be integrated into  other apps. The real problem is that
> OB> OpenFTS is written in perl and porting to  other languages is
> OB> difficult task. new tsearch already has some features of OpenFTS and
> OB> we're slowly moving to idea we should rewrite OpenFTS in 'C',
> OB> so writing interfaces would be much simpler.
> OB> There is major problem with moving ALL features of OpenFTS to tsearch
> OB> we don't know how to resolve - generation of headlines, text fragments
> OB> with hilighted query terms. Once we resolve that we could concentrate
> OB> on tsearch with ranking support.
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org



Re: tsearch comments

От
Oleg Bartunov
Дата:
On 28 Jan 2003, eric@did-it.com wrote:

> Hi,
>
> I guess what we're looking for is something on the order (as much as I
> hate using it as a reference) of MySQL's full text search which does
> offer some ranking.
>
> Just putting ranking alone in tsearch would be a huge benefit. Users can
> then decide in their own language how to display results, especially
> since those results may not necessarily require titles or description
> fragments.
>
> For example, we have several huge tables that have the following
> columns:
>
> > id
> > tbltype
> > title
> > description
>
> Basically, our customer will lookup words that are contained in title
> and description, so we make an additional table like:
>
> > id
> > tblid (id of the source table)
> > tblsource (which table)
> > content (txtidx)
>
> Then we can use tsearch to search the second table (we do now), and once
> we retrieve the id's that we want, we can display results from one or
> more source tables. Just putting in ranking in tsearch would solve all
> these problems.

Hmm, people used to concatenation to get the same result. Do you really
need that table ? Your problem doesn't relate to ranking of results.

We could add some ranking support based on local (per-document) statistics.
Keeping global statistics, for example, TFxIDF, would complicate tsearch
and maintaining of indices. Proximity ranking as in OpenFTS require
more options in tsearch configuration. Let us think about ranking later
after we implement friendly interface.

>
> - Ericson Smith
> http://www.did-it.com
> http://www.weightlossfriends.com
>
>
> On Tue, 2003-01-28 at 14:00, Oleg Bartunov wrote:
> > On Tue, 28 Jan 2003, Uros Gruber wrote:
> >
> > > Hi!
> > >
> > > I think that this would be nice. OpenFTS is great, but i would
> > > be great if this would be implement in real pg functions.
> > >
> > > I think that indexim would be great if pg make it by itself.
> > >
> > > Also it could be great if we could define order of weight of
> > > columns.
> >
> > Could you elaborate this ?
> >
> > >
> > > bye Uros
> > >
> > > I
> > > On 28.01.2003 at 11:53:26, Oleg Bartunov <oleg@sai.msu.su>
> > > wrote:
> > >
> > > > On Tue, 28 Jan 2003 sector119@mail.ru wrote:
> > > >
> > > > > HI
> > > > >
> > > > > will we see sort by relevance at tsearch alpha version? :)
> > > > >
> > > >
> > > > not sure. We concentrate our efforts, well, Teodor is working
> > > > on
> > > > better configurability of tsearch like OpenFTS does.
> > > >
> > > > It\\\'s not difficult to add rather naive relevance based on
> > > > position
> > > > of lexem in document, for example. The question is do you
> > > like
> > > > such
> > > > kind of relevancy ? Real ranking support (as in OpenFTS)
> > > > require
> > > > separate tables to maintain coordinate information.
> > > > We want to keep tsearch as simple as it\\\'s and now we just
> > > add
> > > > better and friendly configurability. Do we need complicate
> > > > tsearch ?
> > > > We already have OpenFTS which has most features people
> > > > requested.
> > > >
> > >
> > >
> > > ---------------------------(end of broadcast)---------------------------
> > > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
> > >
> >
> >     Regards,
> >         Oleg
> > _____________________________________________________________
> > Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
> > Sternberg Astronomical Institute, Moscow University (Russia)
> > Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
> > phone: +007(095)939-16-83, +007(095)939-23-83
> >
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 3: if posting/reading through Usenet, please send an appropriate
> > subscribe-nomail command to majordomo@postgresql.org so that your
> > message can get through to the mailing list cleanly
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
>

    Regards,
        Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83


Re: tsearch comments

От
"eric@did-it.com"
Дата:
Oleg,

We actually have several somewhat similar tables (A, B, C, D, E...) that
have some textual/varchar content. Thus we make a search table Z that
concatenates the textual info from the first tables. Sure, we could
probably use unions and such the like, but performance reasons prohibit
that scenario :-)

Its much better to search the search table, then show the relevant data
from the source tables based on ranked results.

- Ericson Smith


On Wed, 2003-01-29 at 03:37, Oleg Bartunov wrote:
> On 28 Jan 2003, eric@did-it.com wrote:
>
> > Hi,
> >
> > I guess what we're looking for is something on the order (as much as I
> > hate using it as a reference) of MySQL's full text search which does
> > offer some ranking.
> >
> > Just putting ranking alone in tsearch would be a huge benefit. Users can
> > then decide in their own language how to display results, especially
> > since those results may not necessarily require titles or description
> > fragments.
> >
> > For example, we have several huge tables that have the following
> > columns:
> >
> > > id
> > > tbltype
> > > title
> > > description
> >
> > Basically, our customer will lookup words that are contained in title
> > and description, so we make an additional table like:
> >
> > > id
> > > tblid (id of the source table)
> > > tblsource (which table)
> > > content (txtidx)
> >
> > Then we can use tsearch to search the second table (we do now), and once
> > we retrieve the id's that we want, we can display results from one or
> > more source tables. Just putting in ranking in tsearch would solve all
> > these problems.
>
> Hmm, people used to concatenation to get the same result. Do you really
> need that table ? Your problem doesn't relate to ranking of results.
>
> We could add some ranking support based on local (per-document) statistics.
> Keeping global statistics, for example, TFxIDF, would complicate tsearch
> and maintaining of indices. Proximity ranking as in OpenFTS require
> more options in tsearch configuration. Let us think about ranking later
> after we implement friendly interface.
>
> >
> > - Ericson Smith
> > http://www.did-it.com
> > http://www.weightlossfriends.com
> >
> >
> > On Tue, 2003-01-28 at 14:00, Oleg Bartunov wrote:
> > > On Tue, 28 Jan 2003, Uros Gruber wrote:
> > >
> > > > Hi!
> > > >
> > > > I think that this would be nice. OpenFTS is great, but i would
> > > > be great if this would be implement in real pg functions.
> > > >
> > > > I think that indexim would be great if pg make it by itself.
> > > >
> > > > Also it could be great if we could define order of weight of
> > > > columns.
> > >
> > > Could you elaborate this ?
> > >
> > > >
> > > > bye Uros
> > > >
> > > > I
> > > > On 28.01.2003 at 11:53:26, Oleg Bartunov <oleg@sai.msu.su>
> > > > wrote:
> > > >
> > > > > On Tue, 28 Jan 2003 sector119@mail.ru wrote:
> > > > >
> > > > > > HI
> > > > > >
> > > > > > will we see sort by relevance at tsearch alpha version? :)
> > > > > >
> > > > >
> > > > > not sure. We concentrate our efforts, well, Teodor is working
> > > > > on
> > > > > better configurability of tsearch like OpenFTS does.
> > > > >
> > > > > It\\\'s not difficult to add rather naive relevance based on
> > > > > position
> > > > > of lexem in document, for example. The question is do you
> > > > like
> > > > > such
> > > > > kind of relevancy ? Real ranking support (as in OpenFTS)
> > > > > require
> > > > > separate tables to maintain coordinate information.
> > > > > We want to keep tsearch as simple as it\\\'s and now we just
> > > > add
> > > > > better and friendly configurability. Do we need complicate
> > > > > tsearch ?
> > > > > We already have OpenFTS which has most features people
> > > > > requested.
> > > > >
> > > >
> > > >
> > > > ---------------------------(end of broadcast)---------------------------
> > > > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
> > > >
> > >
> > >     Regards,
> > >         Oleg
> > > _____________________________________________________________
> > > Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
> > > Sternberg Astronomical Institute, Moscow University (Russia)
> > > Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
> > > phone: +007(095)939-16-83, +007(095)939-23-83
> > >
> > >
> > > ---------------------------(end of broadcast)---------------------------
> > > TIP 3: if posting/reading through Usenet, please send an appropriate
> > > subscribe-nomail command to majordomo@postgresql.org so that your
> > > message can get through to the mailing list cleanly
> >
> >
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
> >
>
>     Regards,
>         Oleg
> _____________________________________________________________
> Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
> Sternberg Astronomical Institute, Moscow University (Russia)
> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
> phone: +007(095)939-16-83, +007(095)939-23-83
>