Re: how to optimize my c-extension functions
От | Pierre-Frédéric Caillaud |
---|---|
Тема | Re: how to optimize my c-extension functions |
Дата | |
Msg-id | opskemlbpjcq72hf@musicbox обсуждение исходный текст |
Ответ на | Re: how to optimize my c-extension functions (TJ O'Donnell <tjo@acm.org>) |
Список | pgsql-general |
That's not what I meant... I meant, what does 'c1ccccc1C(=O)N' means ? If the search operation is too slow, you can narrow it using standard postgres tools and then hand it down to your C functions. Let me explain, I have no clue about this 'c1ccccc1C(=O)N' syntax, but I'll suppose you will be searching for things like : 1- molecule has N atoms of (whatever) element 2- molecule has N single or double or triple covalent bonds 3- molecule has such and such property Then, if you can understand the 'c1ccccc1C(=O)N' string and say that all molecules that satisfy it will satisfy, for instance condition 2 above, then you can have some fast searchable attributes in your database that will mark all molecules satisfying condition 2, and you'll only need to run the C search function on these to get the real matches. The idea is basically to narrow down the search to avoid calling the expensive operator on all rows. If A and B and strings like your 'c1ccccc1C(=O)N', then if all molecules satsfying B also satisfy A (thus B=>A or "B c A", B is contained in A in set notation), if you can very quickly (with an index) grab the molecules that satisfy A, and these are a significantly smaller number than the whole set, then you'll speed your search a lot. If you can find some more A's, so that B c A1, B c A2, B c A3, then B c (intersection of A1, A2, A3) which maps neatly to the gist index on an integer array. So you could have a set of basic conditions, maybe a hundred or so, which would be all tested on the search string to see which will apply to the molecules this search string would find, then you translate this into a GiST query. Are my explications making it clearer or just more obfuscated ? > The only type of search will be of the type: > > Select smiles,id from structure where > oe_matches(smiles,'c1ccccc1C(=O)N'); > > or joins with other tables e.g.
В списке pgsql-general по дате отправления: