Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)
От | Tom Lane |
---|---|
Тема | Re: reducing the footprint of ScanKeyword (was Re: Large writable variables) |
Дата | |
Msg-id | 8228.1545937024@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: reducing the footprint of ScanKeyword (was Re: Large writable variables) (John Naylor <jcnaylor@gmail.com>) |
Ответы |
Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)
(John Naylor <jcnaylor@gmail.com>)
Re: reducing the footprint of ScanKeyword (was Re: Large writable variables) (John Naylor <jcnaylor@gmail.com>) Re: reducing the footprint of ScanKeyword (was Re: Large writable variables) (John Naylor <jcnaylor@gmail.com>) |
Список | pgsql-hackers |
John Naylor <jcnaylor@gmail.com> writes: > Barring additional bikeshedding on 0001, I'll plan on implementing > offset-based lookup for the other keyword types and retire the old > ScanKeyword. Once that's done, we can benchmark and compare with the > optimizations in 0002. Sounds like a plan. Assorted minor bikeshedding on v4-0001 (just from eyeballing it, I didn't test it): +/* Like ScanKeywordLookup, but uses offsets into a keyword string. */ +int +ScanKeywordLookupOffset(const char *string_to_lookup, + const char *kw_strings, Not really "like" it, since the return value is totally different and so is the representation of the keyword list. I realize that your plan is probably to get rid of ScanKeywordLookup and then adapt the latter's comment for this code, but don't forget that you need to adjust said comment. +/* Payload data for keywords */ +typedef struct ScanKeywordAux +{ + int16 value; /* grammar's token code */ + char category; /* see codes above */ +} ScanKeywordAux; There isn't really any point in changing category to "char", because alignment considerations will mandate that sizeof(ScanKeywordAux) be a multiple of 2 anyway. With some compilers we could get around that with a pragma to force non-aligned storage, but doing so would be a net loss on most non-Intel architectures. If you really are hot about saving that other 440 bytes, the way to do it would be to drop the struct entirely and use two parallel arrays, an int16[] for value and a char[] (or better uint8[]) for category. Those would be filled by reading kwlist.h twice with different definitions for PG_KEYWORD. Not sure it's worth the trouble though --- in particular, not clear that it's a win from the standpoint of number of cache lines touched. diff --git a/src/pl/plpgsql/src/.gitignore b/src/pl/plpgsql/src/.gitignore @@ -1,3 +1,4 @@ +/*kwlist_d.h Not a fan of using wildcards in .gitignore files, at least not when there's just one or two files you intend to match. # Force these dependencies to be known even without dependency info built: -pl_gram.o pl_handler.o pl_comp.o pl_exec.o pl_funcs.o pl_scanner.o: plpgsql.h pl_gram.h plerrcodes.h +pl_gram.o pl_handler.o pl_comp.o pl_exec.o pl_funcs.o pl_scanner.o: plpgsql.h pl_gram.h plerrcodes.h pl_unreserved_kwlist_d.h Hm, do we really need any more than pl_scanner.o to depend on that header? +/* FIXME: Have to redefine this symbol for the WIP. */ +#undef PG_KEYWORD +#define PG_KEYWORD(kwname, value, category) {value, category}, + +static const ScanKeywordAux unreserved_keywords[] = { +#include "pl_unreserved_kwlist.h" }; The category isn't useful for this keyword list, so couldn't you just make this an array of uint16 values? diff --git a/src/pl/plpgsql/src/pl_unreserved_kwlist.h b/src/pl/plpgsql/src/pl_unreserved_kwlist.h +/* name, value, category */ +PG_KEYWORD("absolute", K_ABSOLUTE, UNRESERVED_KEYWORD) Likewise, I'd just have these be two-argument macros. There's no reason for the various kwlist.h headers to agree on the number of payload arguments for PG_KEYWORD. diff --git a/src/tools/gen_keywords.pl b/src/tools/gen_keywords.pl + elsif ($arg =~ /^-o/) + { + $output_path = length($arg) > 2 ? substr($arg, 2) : shift @ARGV; + } My perl-fu is not great, but it looks like this will accept arguments like "-ofilename", which is a style I don't like at all. I'd rather either insist on the filename being separate or write the switch like "-o=filename". Also, project style when taking both forms is usually more like -o filename --offset=filename +$kw_input_file =~ /((\w*)kwlist)\.h/; +my $base_filename = $1; +$prefix = $2 if !defined $prefix; Hmm, what happens if the input filename does not end with "kwlist.h"? +# Parse keyword header for names. +my @keywords; +while (<$kif>) +{ + if (/^PG_KEYWORD\("(\w+)",\s*\w+,\s*\w+\)/) This is assuming more than it should about the number of arguments for PG_KEYWORD, as well as what's in them. I think it'd be sufficient to match like this: if (/^PG_KEYWORD\("(\w+)",/) +Options: + -o output path + -p optional prefix for generated data structures This usage message is pretty vague about how you write the options (cf gripe above). I looked very briefly at v4-0002, and I'm not very convinced about the "middle" aspect of that optimization. It seems unmaintainable, plus you've not exhibited how the preferred keywords would get selected in the first place (wiring them into the Perl script is surely not acceptable). If you want to pursue that, please separate it into an 0002 that just adds the letter-range aspect and then an 0003 that adds the "middle" business on top. Then we can do testing to see whether either of those ideas are worthwhile. regards, tom lane
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Andrew DunstanДата:
Сообщение: Re: reducing the footprint of ScanKeyword (was Re: Large writablevariables)