Re: proposal: possibility to read dumped table's name from file

Поиск
Список
Период
Сортировка
От Pavel Stehule
Тема Re: proposal: possibility to read dumped table's name from file
Дата
Msg-id CAFj8pRCsZuKRRdqZoYYo_wW-YjpWGA_ie9nhwJRd9E+GmsShrQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: proposal: possibility to read dumped table's name from file  (Justin Pryzby <pryzby@telsasoft.com>)
Ответы Re: proposal: possibility to read dumped table's name from file  (Justin Pryzby <pryzby@telsasoft.com>)
Список pgsql-hackers


st 1. 7. 2020 v 23:24 odesílatel Justin Pryzby <pryzby@telsasoft.com> napsal:
On Thu, Jun 11, 2020 at 09:36:18AM +0200, Pavel Stehule wrote:
> st 10. 6. 2020 v 0:30 odesílatel Justin Pryzby <pryzby@telsasoft.com> napsal:
> > > +                                             /* ignore empty rows */
> > > +                                             if (*line != '\0')
> >
> > Maybe: if line=='\0': continue
> > We should also support comments.

Comment support is still missing but easily added :)

I tried this patch and it works for my purposes.

Also, your getline is dynamically re-allocating lines of arbitrary length.
Possibly that's not needed.  We'll typically read "+t schema.relname", which is
132 chars.  Maybe it's sufficient to do
char buf[1024];
fgets(buf);
if strchr(buf, '\n') == NULL: error();
ret = pstrdup(buf);

63 bytes is max effective identifier size, but it is not max size of identifiers. It is very probably so buff with 1024 bytes will be enough for all, but I do not want to increase any new magic limit. More when dynamic implementation is not too hard.

Table name can be very long - sometimes the data names (table names) can be stored in external storages with full length and should not be practical to require truncating in filter file.

For this case it is very effective, because a resized (increased) buffer is used for following rows, so realloc should not be often. So when I have to choose between two implementations with similar complexity, I prefer more dynamic code without hardcoded limits. This dynamic hasn't any overhead.


In any case, you could have getline return a char* and (rather than following
GNU) no need to take char**, int* parameters to conflate inputs and outputs.

no, it has a special benefit. It eliminates the short malloc/free cycle. When some lines are longer, then the buffer is increased (and limits), and for other rows with same or less size is not necessary realloc.


I realized that --filter has an advantage over the previous implementation
(with multiple --exclude-* and --include-*) in that it's possible to use stdin
for includes *and* excludes.

yes, it looks like better choose


By chance, I had the opportunity yesterday to re-use with rsync a regex that
I'd previously been using with pg_dump and grep.  What this patch calls
"--filter" in rsync is called "--filter-from".  rsync's --filter-from rejects
filters of length longer than max filename, so I had to split it up into
multiple lines instead of using regex alternation ("|").  This option is a
close parallel in pg_dump.

we can talk about option name - maybe "--filter-from" is better than just "--filter"

Regards

Pavel


 

--
Justin

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Pavel Stehule
Дата:
Сообщение: Re: proposal: possibility to read dumped table's name from file
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Ideas about a better API for postgres_fdw remote estimates