Обсуждение: problem with splitting a string

Поиск
Список
Период
Сортировка

problem with splitting a string

От
Werner Echezuria
Дата:
Hi,<br /><br />I'm trying to develop a contrib module in order to parse sqlf queries, I'm using lemon as a LALR parser
generator(because I think it's easier than bison) and re2c (because I think it's easier than flex) but when I try to
splitthe string into words postgres add some weird characters (this works in pure gcc), I write something like "CREATE
FUZZYPREDICATE joven ON 0..120 AS (0,0,35,120);", but postgresql adds a character like  at the end of "joven" and the
otherswords.<br /><br />The code I use to split the string is:<br /><br />void parse_query(char *str,const char
**sqlf){<br/><br />    parse_words(str);<br />    *sqlf=fuzzy_query;<br />}<br />void parse_words(char *str){<br />   
char*word;<br />    int token;<br />     const char semicolon =';';<br />    const char dot='.';<br />    const char
comma=',';<br/>    const char open_bracket='(';<br />    const char close_bracket=')';<br />    struct Token sToken;<br
/><br/>    int i = 0;<br /><br />    void* pParser = ParseAlloc (malloc);<br /><br />    while(str[i] !='\0'){<br />   
   int c=0;<br /><br />        word=(char *)malloc(sizeof(char));<br /><br />        if(isspace(str[i]) ||
str[i]==semicolon){<br/>             i++;<br />            continue;<br />        }<br /><br />        if
(str[i]==open_bracket|| str[i]==close_bracket ||<br />            str[i]==dot || str[i]==comma){<br />               
word[c]= str[i];<br />                i++;<br />                 token=scan(word, strlen(word));<br />               
Parse(pParser,token, sToken);<br />                continue;<br />        }else{<br />           
while(!isspace(str[i])&& str[i]!=semicolon && str[i]!='\0' &&<br />                    
str[i]!=open_bracket&& str[i]!=close_bracket &&<br />                    str[i]!=dot &&
str[i]!=comma){<br/>                        word[c++] = str[i++];<br />            }<br />        }<br /><br />       
token=scan(word,strlen(word));<br /><br />        if (token==PARAMETRO){<br />            //TODO: I don't know why it
needsthe malloc function again, all I know is it's working<br />            const char *param=word;<br />            
word=(char *)malloc(sizeof(char));<br />            sToken.z=param;<br />        }<br /><br />        Parse(pParser,
token,sToken);<br />        free(word);<br />    }<br />  Parse(pParser, 0, sToken);<br />  ParseFree(pParser, free
);<br/><br />}<br /><br />Header:<br /><br />#ifndef SQLF_H_<br />#define SQLF_H_<br /><br />typedef struct Token {<br
/> const char *z;<br />  int value;<br />  unsigned n;<br />} Token;<br />void parse_query(char *str,const char
**sqlf);<br/>void parse_words(char *str);<br /> int scan(char *s, int l);<br /><br />#endif /* SQLF_H_ */<br /><br
/><br/>Screen:<br /><br />postgres=# select * from fuzzy.sqlf('CREATE FUZZY PREDICATE joven ON 0..120 AS
(0,0,35,120);'::text);<br/>ERROR:  syntax error at or near ""<br /> LINE 1: INSERT INTO fuzzydb.pg_fuzzypredicate
VALUES(joven,0�<br/>                                                               �,120<br
/>                                                                    ...<br
/>                                                         ^<br /> QUERY:  INSERT INTO fuzzydb.pg_fuzzypredicate
VALUES(joven,0�<br/>                                                               �,120<br
/>                                                                    �,0�<br
/>                                                                          �,0�<br />
                                                                                �,35<br
/>                                                                                      �,120<br
/>                                                                                            �);<br /><br />Thanks
forany help<br /> 

Re: problem with splitting a string

От
Tom Lane
Дата:
Werner Echezuria <wercool@gmail.com> writes:
> I'm trying to develop a contrib module in order to parse sqlf queries, I'm
> using lemon as a LALR parser generator (because I think it's easier than
> bison) and re2c (because I think it's easier than flex) but when I try to
> split the string into words postgres add some weird characters (this works
> in pure gcc), I write something like "CREATE FUZZY PREDICATE joven ON 0..120
> AS (0,0,35,120);", but postgresql adds a character like  at the end of
> "joven" and the others words.

Maybe you are expecting 'text' values to be null-terminated?  They are
not.  You might look into using TextDatumGetCString or related functions
to convert.
        regards, tom lane

PS: the chances of us accepting a contrib module that requires
significant unusual infrastructure to build seem pretty low from
where I sit.  You're certainly free to do whatever you want for
private work, or even for a pgfoundry project --- but if you do
have ambitions of this eventually becoming contrib, "it's easier"
is not going to be sufficient rationale to not use bison/flex.


Re: problem with splitting a string

От
Werner Echezuria
Дата:
Hi,

Well, I use TextDatumGetCString in the main file, but it remains with the weird characters.

this is the main file:

#include "postgres.h"
#include "fmgr.h"
#include "gram.h"
#include "sqlf.h"
#include "utils/builtins.h"

extern Datum sqlf(PG_FUNCTION_ARGS);

PG_MODULE_MAGIC;

PG_FUNCTION_INFO_V1(sqlf);

Datum
sqlf(PG_FUNCTION_ARGS){

    char        *query = TextDatumGetCString(PG_GETARG_DATUM(0));
    const char    *parse_str;
    char         *result;

    parse_query(query,&parse_str);

    result=parse_str;

    PG_RETURN_TEXT_P(cstring_to_text(result));
}

About the PS: Ok, I understand that if I want that you include this as a contrib module I need to use bison/flex, I never thought about it, but I now have a couple of questions:
What are the chances to really include it in PostgreSQL as a contrib module?
Are there any requirement I have to follow?

2009/8/6 Tom Lane <tgl@sss.pgh.pa.us>
Werner Echezuria <wercool@gmail.com> writes:
> I'm trying to develop a contrib module in order to parse sqlf queries, I'm
> using lemon as a LALR parser generator (because I think it's easier than
> bison) and re2c (because I think it's easier than flex) but when I try to
> split the string into words postgres add some weird characters (this works
> in pure gcc), I write something like "CREATE FUZZY PREDICATE joven ON 0..120
> AS (0,0,35,120);", but postgresql adds a character like   at the end of
> "joven" and the others words.

Maybe you are expecting 'text' values to be null-terminated?  They are
not.  You might look into using TextDatumGetCString or related functions
to convert.

                       regards, tom lane

PS: the chances of us accepting a contrib module that requires
significant unusual infrastructure to build seem pretty low from
where I sit.  You're certainly free to do whatever you want for
private work, or even for a pgfoundry project --- but if you do
have ambitions of this eventually becoming contrib, "it's easier"
is not going to be sufficient rationale to not use bison/flex.

Re: problem with splitting a string

От
Tom Lane
Дата:
Werner Echezuria <wercool@gmail.com> writes:
> Well, I use TextDatumGetCString in the main file, but it remains with the
> weird characters.

Hmm, no ideas then.  Your interface code looks fine (making parse_str
const seems a bit strange, but it's not related to the problem at hand).
Given that the problems appear at token boundaries I'd guess that re2c
isn't behaving the way you expect, but I'm not familiar with that tool
so I can't give any specific advice.

> About the PS: Ok, I understand that if I want that you include this as a
> contrib module I need to use bison/flex, I never thought about it, but I now
> have a couple of questions:
> What are the chances to really include it in PostgreSQL as a contrib module?
> Are there any requirement I have to follow?

Well, it'd mainly be a question of whether there's enough interest out
there, which I can't judge.  From a project standpoint we just require
that it be BSD-licensed and not impose any undue new burden on
maintainers (thus not wanting new build tools), but beyond that it's a
matter of how many people might use it.
        regards, tom lane


Re: problem with splitting a string

От
Alvaro Herrera
Дата:
Tom Lane escribió:

> Well, it'd mainly be a question of whether there's enough interest out
> there, which I can't judge.  From a project standpoint we just require
> that it be BSD-licensed and not impose any undue new burden on
> maintainers (thus not wanting new build tools), but beyond that it's a
> matter of how many people might use it.

What use is there for fuzzy predicates?  I think it would mainly be to
stop more students from coming up with new implementations of the same
thing over and over.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: problem with splitting a string

От
Werner Echezuria
Дата:

What use is there for fuzzy predicates?  I think it would mainly be to
stop more students from coming up with new implementations of the same
thing over and over.

Well, I'm sorry if anyone of us who is involved on these projects have already explain the true usefulness of sqlf and fuzzy database, I guess we focus just in the technical problem, but never explain the theory.

For example here is a paragraph from Flexible queries in relational databases paper:

   "This paper deals with this second type of "uncertainty" and is concerned essentially with
database language extensions in order to deal with more expressive requirements. Indeed,
consider a query such that, for instance, "retrieve the apartments which are not too expensive
and not too far from downtown". In such a case, there does not exist a definite threshold for
which the price becomes suddenly too high, but rather we have to discriminate between
prices which are perfectly acceptable for the user, and other prices, somewhat higher, which
are still more or less acceptable (especially if the apartment is close to downtown). Note that
the meaning of vague predicate expressions like "not too expensive" is context/user
dependent, rather than universal. Fuzzy set membership functions [26] are convenient tools
for modelling user's preference profiles and the large panoply of fuzzy set connectives can
capture the different user attitudes concerning the way the different criteria present in his/her
query compensate or not; see [4] for a unified presentation in the fuzzy set framework of the
existing proposals for handling flexible queries. Moreover in a given query, some part of the
request may be less important to fulfill (e.g., in the above example, the price requirement
may be judged more important than the distance to downtown); the handling of importance
leads to the need for weighted connectives, as it will be seen in the following."


I really think this could be something useful, but it is sometimes difficult to implement and I'm trying to make a different and easy way to do things.

regards