Re: Longest Common Subsequence in Postgres - Algorithm Challenge

Поиск
Список
Период
Сортировка
От Marcin Mańk
Тема Re: Longest Common Subsequence in Postgres - Algorithm Challenge
Дата
Msg-id 66418150-BB94-426C-81E1-5EDCF33A55EE@gmail.com
обсуждение исходный текст
Ответ на Re: Longest Common Subsequence in Postgres - Algorithm Challenge  (Michael Paquier <michael.paquier@gmail.com>)
Список pgsql-general

Dnia 9 lip 2013 o godz. 00:46 Michael Paquier <michael.paquier@gmail.com> napisał(a):

> On Tue, Jul 9, 2013 at 5:04 AM, Robert James <srobertjames@gmail.com> wrote:
>> On 7/8/13, hubert depesz lubaczewski <depesz@depesz.com> wrote:
>>> On Mon, Jul 08, 2013 at 09:09:26AM -0400, Robert James wrote:
>>>> I have two relations, where each relation has two fields, one
>>>> indicating a name and one indicating a position.  That is, each
>>>> relation defines a sequence.
>>>>
>>>> I need to determine their longest common subsequence.  Yes, I can do
>>>> this by fetching all the data into Java (or any other language) and
>>>> computing it using the standard LCS dynamic programming language.  But
>>>> I'd like to stay within Postgres.  Is there any way to do this?
>>>
>>> I'm not entirely sure I understand. Can you show us some sample data and
>>> expected output?
>>
>> Sure.  Borrowing a good example from
>> http://wordaligned.org/articles/longest-common-subsequence :
>>
>> Table A (val varchar primary key, pos integer):
>> 1, "C"
>> 2, "H"
>> 3, "I"
>> 4, "M"
>> 5, "P"
>> 6, "A"
>> 7, "N"
>> 8, "Z"
>> 9, "E"
>> 10, "E"
>>
>> Table B (val varchar primary key, pos integer):
>> 1, "H"
>> 2, "U"
>> 3, "M"
>> 4, "A"
>> 5, "N"
>>
>> SELECT LongestCommonSubsequence(A,B):
>> 1, "H"
>> 2, "M"
>> 3, "A"
>> 4, "N"
>>
>> (Common chars are in upper case:
>> cHiMpANzee
>> HuMAN
>> )
>>
>> The std dynamic programming algorithm is described at
>> http://en.wikipedia.org/wiki/Longest_common_subsequence_problem
> Your best bet on that would be to implement using either a procedural
> language (plpython, plperl that are in core, or even another one), or
> a C function and install it on server side with an extension. I'll do
> the latter if I were you. Roughly such a function would take in
> arguments the OIDs of the functions to compare (or their relation
> name), and return a set of rows describing the longest sequence found.
>
Also if you only need the length of lcs, you can derive it using the built in levenshtein with costs function

В списке pgsql-general по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: domains, case statements, functions: bug?
Следующее
От: Mike Christensen
Дата:
Сообщение: PERFORM statement