Re: big joins not converging

Поиск
Список
Период
Сортировка
От fork
Тема Re: big joins not converging
Дата
Msg-id loom.20110311T173103-318@post.gmane.org
обсуждение исходный текст
Ответ на big joins not converging  (Dan Ancona <da@vizbang.com>)
Список pgsql-performance
Dan Ancona <da <at> vizbang.com> writes:

>  his is a choice between
> developing some in-house capacity for this and sending people to
> various vendors so we'll probably lean on the vendors for now, at
> least while we work on it.

I would try to do the record matching in house and see how far you get, even if
you are talking to vendors concurrently.  You might get lucky, and you will
learn a lot about your data and how much to expect and pay for vendor solutions.

I would: Try building multi column indices on both tables for what you think are
the same rows, and match deterministically (if you have a key like social
security, then do this again on full names).  Examine your data to see what
hits, what misses, what hits multiple.  If you know there is a "good" and an
"iffy" table, you can use a left outer, otherwise you need a full outer.   Then
put all your leftovers from each into new tables, and try again with something
fuzzy.

If you build the indices and use "=" and it is still slow, ask again here --
that shouldn't happen.

> And you're right fork, Record Linkage is in fact an entire academic
> discipline!

Indeed.  Look for "blocking" and "editing" with your data first, I think.

I find this problem pretty interesting, so I would love to hear your results.  I
am right now matching building permits to assessor parcels....  I wish I was
using PG ...

В списке pgsql-performance по дате отправления:

Предыдущее
От: "hans wulf"
Дата:
Сообщение: ANTI-JOIN needs table, index scan not possible?
Следующее
От: Maciek Sakrejda
Дата:
Сообщение: Re: ANTI-JOIN needs table, index scan not possible?