big joins not converging

Поиск
Список
Период
Сортировка
От Dan Ancona
Тема big joins not converging
Дата
Msg-id D1D5E528-DAEB-4E4F-A3E6-EA2A95340D1A@vizbang.com
обсуждение исходный текст
Ответы Re: big joins not converging
Список pgsql-performance
Hi postgressers -

As part of my work with voter file data, I pretty regularly have to join one large-ish (over 500k rows) table to another. Sometimes this is via a text field (countyname) + integer (voter id). I've noticed sometimes this converges and sometimes it doesn't, seemingly regardless of how I index things. So I'm looking for general thoughts on the joining of large tables, but also running into a specific issue with the following slightly different query:

This one is between two tables that are a 754k row list of voters and a 445k row list of property owners. (I'm trying to find records where the owner name matches the voter name at the same address.) I have btree single column indices built on all the relevant fields, and multicolumn indices built across all the columns I'm matching. The full schemas of both tables are below. The machine is an older-ish (3 years ago) dual-core pentium w/ 4GB RAM running FreeBSD, more details below.

This is the query I've come up with so far:

explain analyze
update vanalameda set ownerflag = 'exact'
  from aralameda where
  vanalameda.streetno ~~ aralameda.streetnum and
  vanalameda.streetname ~~ aralameda.streetname and
  vanalameda.lastname ~~ aralameda.ownername and
  vanalameda.firstname ~~ aralameda.ownername;

If I include the analyze, this didn't complete after running overnight. If I drop the analyze and just explain, I get this:

"Nested Loop  (cost=46690.74..15384448712.74 rows=204 width=204)"
"  Join Filter: (((vanalameda.streetno)::text ~~ (aralameda.streetnum)::text) AND ((vanalameda.streetname)::text ~~ (aralameda.streetname)::text) AND ((vanalameda.lastname)::text ~~ (aralameda.ownername)::text) AND ((vanalameda.firstname)::text ~~ (aralameda.ownername)::text))"
"  ->  Seq Scan on vanalameda  (cost=0.00..26597.80 rows=734780 width=204)"
"  ->  Materialize  (cost=46690.74..58735.87 rows=444613 width=113)"
"        ->  Seq Scan on aralameda  (cost=0.00..38647.13 rows=444613 width=113)"

One general question: does the width of the tables (i.e. the numbers of columns not being joined and the size of those fields) matter? The tables do have a lot of extra columns that I could slice out.

Thanks so much!

Dan

System:
client: pgadmin III, Mac OS

server:
select version();
PostgreSQL 8.3.7 on i386-portbld-freebsd7.2, compiled by GCC cc (GCC) 4.2.1 20070719  [FreeBSD]
(installed from freebsd package system, default configuration)

%sysctl -a | egrep -i 'hw.machine|hw.model|hw.ncpu'
hw.machine: i386
hw.model: Genuine Intel(R) CPU            2160  @ 1.80GHz
hw.ncpu: 2
hw.machine_arch: i386

w/ 4GB RAM, 1 1GB disk, no RAID.

Here's the tables...

              Table "public.aralameda"
     Column      |         Type          | Modifiers 
-----------------+-----------------------+-----------
 dt000o039001010 | character varying(13) | 
 o3901010        | character varying(15) | 
 dt17            | character varying(2)  | 
 dt046           | character varying(3)  | 
 streetnum       | character varying(10) | 
 streetname      | character varying(50) | 
 unitnum         | character varying(10) | 
 city            | character varying(30) | 
 zip             | character varying(5)  | 
 unk3            | character varying(1)  | 
 crap1           | character varying(12) | 
 crap2           | character varying(12) | 
 crap3           | character varying(12) | 
 crap4           | character varying(12) | 
 crap5           | character varying(12) | 
 crap6           | character varying(12) | 
 crap7           | character varying(12) | 
 crap8           | character varying(12) | 
 crap9           | character varying(12) | 
 crap10          | character varying(12) | 
 dt2009          | character varying(4)  | 
 dt066114        | character varying(6)  | 
 crap11          | character varying(8)  | 
 crap12          | character varying(8)  | 
 ownername       | character varying(50) | 
 careofname      | character varying(50) | 
 unk4            | character varying(1)  | 
 maddr1          | character varying(60) | 
 munitnum        | character varying(10) | 
 mcitystate      | character varying(30) | 
 mzip            | character varying(5)  | 
 mplus4          | character varying(4)  | 
 dt40            | character varying(2)  | 
 dt4             | character varying(1)  | 
 crap13          | character varying(8)  | 
 d               | character varying(1)  | 
 dt0500          | character varying(4)  | 
 unk6            | character varying(1)  | 
 crap14          | character varying(8)  | 
 unk7            | character varying(1)  | 
Indexes:
    "arall" btree (streetnum, streetname, ownername)
    "aroname" btree (ownername)
    "arstreetname" btree (streetname)
    "arstreetnum" btree (streetnum)

             Table "public.vanalameda"
    Column     |         Type          | Modifiers 
---------------+-----------------------+-----------
 vanid         | character varying(8)  | 
 lastname      | character varying(25) | 
 firstname     | character varying(16) | 
 middlename    | character varying(16) | 
 suffix        | character varying(3)  | 
 streetno      | character varying(5)  | 
 streetnohalf  | character varying(3)  | 
 streetprefix  | character varying(2)  | 
 streetname    | character varying(24) | 
 streettype    | character varying(4)  | 
 streetsuffix  | character varying(2)  | 
 apttype       | character varying(4)  | 
 aptno         | character varying(8)  | 
 city          | character varying(13) | 
 state         | character varying(2)  | 
 zip5          | character varying(5)  | 
 zip4          | character varying(4)  | 
 vaddress      | character varying(33) | 
 maddress      | character varying(41) | 
 mcity         | character varying(25) | 
 mstate        | character varying(2)  | 
 mzip5         | character varying(5)  | 
 mzip4         | character varying(4)  | 
 mstreetno     | character varying(6)  | 
 mstreetnohalf | character varying(9)  | 
 mstreetprefix | character varying(2)  | 
 mstreetname   | character varying(40) | 
 mstreettype   | character varying(4)  | 
 mstreetsuffix | character varying(2)  | 
 mapttype      | character varying(4)  | 
 maptno        | character varying(13) | 
 dob           | character varying(10) | 
 countyfileid  | character varying(7)  | 
 countyid      | character varying(3)  | 
 affno         | character varying(12) | 
 ownerflag     | character varying(20) | 
Indexes:
    "vanall" btree (streetno, streetname, lastname, firstname)
    "vanfname" btree (firstname)
    "vanlname" btree (lastname)
    "vanstreetname" btree (streetname)
    "vanstreetno" btree (streetno)

В списке pgsql-performance по дате отправления:

Предыдущее
От: Merlin Moncure
Дата:
Сообщение: Re: unexpected stable function behavior
Следующее
От: Steve Atkins
Дата:
Сообщение: Re: big joins not converging