Re: big joins not converging

Поиск
Список
Период
Сортировка
От Steve Atkins
Тема Re: big joins not converging
Дата
Msg-id 6C998CEA-5A42-4DA9-B5E0-33C17C1BF69E@blighty.com
обсуждение исходный текст
Ответ на big joins not converging  (Dan Ancona <da@vizbang.com>)
Список pgsql-performance
On Mar 10, 2011, at 1:25 PM, Dan Ancona wrote:

> Hi postgressers -
>
> As part of my work with voter file data, I pretty regularly have to join one large-ish (over 500k rows) table to
another.Sometimes this is via a text field (countyname) + integer (voter id). I've noticed sometimes this converges and
sometimesit doesn't, seemingly regardless of how I index things. So I'm looking for general thoughts on the joining of
largetables, but also running into a specific issue with the following slightly different query: 
>
> This one is between two tables that are a 754k row list of voters and a 445k row list of property owners. (I'm trying
tofind records where the owner name matches the voter name at the same address.) I have btree single column indices
builton all the relevant fields, and multicolumn indices built across all the columns I'm matching. The full schemas of
bothtables are below. The machine is an older-ish (3 years ago) dual-core pentium w/ 4GB RAM running FreeBSD, more
detailsbelow. 
>
> This is the query I've come up with so far:
>
> explain analyze
> update vanalameda set ownerflag = 'exact'
>   from aralameda where
>   vanalameda.streetno ~~ aralameda.streetnum and
>   vanalameda.streetname ~~ aralameda.streetname and
>   vanalameda.lastname ~~ aralameda.ownername and
>   vanalameda.firstname ~~ aralameda.ownername;
>
> If I include the analyze, this didn't complete after running overnight. If I drop the analyze and just explain, I get
this:
>
> "Nested Loop  (cost=46690.74..15384448712.74 rows=204 width=204)"
> "  Join Filter: (((vanalameda.streetno)::text ~~ (aralameda.streetnum)::text) AND ((vanalameda.streetname)::text ~~
(aralameda.streetname)::text)AND ((vanalameda.lastname)::text ~~ (aralameda.ownername)::text) AND
((vanalameda.firstname)::text~~ (aralameda.ownername)::text))" 
> "  ->  Seq Scan on vanalameda  (cost=0.00..26597.80 rows=734780 width=204)"
> "  ->  Materialize  (cost=46690.74..58735.87 rows=444613 width=113)"
> "        ->  Seq Scan on aralameda  (cost=0.00..38647.13 rows=444613 width=113)"
>
> One general question: does the width of the tables (i.e. the numbers of columns not being joined and the size of
thosefields) matter? The tables do have a lot of extra columns that I could slice out. 
>

Is there any reason you're using '~~' to compare values, rather than '='?

If you're intentionally using LIKE-style comparisons then there are some other things you can do, but I don't think you
meanto do that, for streeno and streetname anyway. 

Switching to an equality comparison should let your query use an index, most usefully one on (streetname, streetnum)
probably.

I'm not sure what you're intending by comparing ownername to both firstname and lastname. I don't think that'll do
anythinguseful, and doubt it'll ever match. Are you expecting firstname and lastname to be substrings of ownername? If
so,you might need to use wildcards with the like. 

(Also, performance and smart use of indexes tends to get better in newer versions of postgresql. You might want to
upgradeto 9.0.3 too.) 

Cheers,
  Steve



В списке pgsql-performance по дате отправления:

Предыдущее
От: Dan Ancona
Дата:
Сообщение: big joins not converging
Следующее
От: Marti Raudsepp
Дата:
Сообщение: Re: Tuning massive UPDATES and GROUP BY's?