Обсуждение: Any Good Way To Do Sync DB's?
I have a project where a client has products stored in a large Progress DB on an NT server. The web server is a FreeBSD box though, and the client wants to try to avoid the $5,500 license for the Unlimited Connections via OpenLink software and would like to take advantage of the 'free' non-expiring 2 connection (concurrent) license. This wouldn't be a huge problem, but the DB can easily reach 1 million records. Is there any good way to pull this data out of Progess and get it into Postgres? This is way too large of a db to do a "SELECT * FROM table" and do an insert for each row. Any brilliant ideas? Thanks, Joe
Joseph Koenig <joe@jwebmedia.com> writes: > I have a project where a client has products stored in a large Progress > DB on an NT server. The web server is a FreeBSD box though, and the > client wants to try to avoid the $5,500 license for the Unlimited > Connections via OpenLink software and would like to take advantage of > the 'free' non-expiring 2 connection (concurrent) license. This wouldn't > be a huge problem, but the DB can easily reach 1 million records. Is > there any good way to pull this data out of Progess and get it into > Postgres? This is way too large of a db to do a "SELECT * FROM table" > and do an insert for each row. Any brilliant ideas? Thanks, Probably the best thing to do is to export the data from Progress in a format that the PostgreSQL COPY command can read. See the docs for details. -Doug -- Let us cross over the river, and rest under the shade of the trees. --T. J. Jackson, 1863
On 12 Oct 2001, Doug McNaught wrote: > Joseph Koenig <joe@jwebmedia.com> writes: > > > I have a project where a client has products stored in a large Progress > > DB on an NT server. The web server is a FreeBSD box though, and the > > client wants to try to avoid the $5,500 license for the Unlimited > > Connections via OpenLink software and would like to take advantage of > > the 'free' non-expiring 2 connection (concurrent) license. This wouldn't > > be a huge problem, but the DB can easily reach 1 million records. Is > > there any good way to pull this data out of Progess and get it into > > Postgres? This is way too large of a db to do a "SELECT * FROM table" > > and do an insert for each row. Any brilliant ideas? Thanks, > > Probably the best thing to do is to export the data from Progress in a > format that the PostgreSQL COPY command can read. See the docs for > details. I'm going to have to rant now. The "dump" and "restore" which use the COPY method are actually totally useless for large databases. The reason for this is simple. Copying a 4 GB table with 40M rows requires over 40GB of temporary scratch space to copy, due to the WAL temp files. That sounds totally silly. Why doesn't pg_dump insert commits every 1000 rows or so??? Cheers. Gordan
On Fri, 12 Oct 2001, Joseph Koenig wrote: > I have a project where a client has products stored in a large Progress > DB on an NT server. The web server is a FreeBSD box though, and the > client wants to try to avoid the $5,500 license for the Unlimited > Connections via OpenLink software and would like to take advantage of > the 'free' non-expiring 2 connection (concurrent) license. This wouldn't > be a huge problem, but the DB can easily reach 1 million records. Is > there any good way to pull this data out of Progess and get it into > Postgres? This is way too large of a db to do a "SELECT * FROM table" > and do an insert for each row. Any brilliant ideas? Thanks, Yes. Throw away NT, and replace it with Linux or FreeBSD. They are free, they are much more stable, and you will find that PostgreSQL on Linux is likely to utterly blow away PostgreSQL on NT. IIRC, something to do with the way NT handles forking and threads. Allegedly, this is particularly noticeable in a heavy multi-user environment, where lots of connections are spawned and killed. Another thing you could try - Apache persistent DBI connections using mod_perl. Set up Apache to use mod_perl and the persistent DBI database connections through it. Then limit the number of Apache instances to 2 (i.e. up to 2 simultaneous connections). Then put an accelerating squid proxy in front of the Apache web server, so that it can handle static objects without wasting precious Apache back-ends. All this seems like a hugely complicated solution when you could just replace NT with Linux or FreeBSD and have a cheaper and MUCH faster sollution, not to mention more stable. Just my $0.02 Regards. Gordan
Gordan Bobic <gordan@bobich.net> writes:
> I'm going to have to rant now. The "dump" and "restore" which use the COPY
> method are actually totally useless for large databases. The reason for
> this is simple. Copying a 4 GB table with 40M rows requires over 40GB of
> temporary scratch space to copy, due to the WAL temp files. That sounds
> totally silly. Why doesn't pg_dump insert commits every 1000 rows or so???
Yup, "rant" is the correct term.  The WAL file growth problem has been
fixed as of 7.1.3, so your premise is false.
            regards, tom lane
			
		On Saturday 13 Oct 2001 15:54, you wrote: > Gordan Bobic <gordan@bobich.net> writes: > > I'm going to have to rant now. The "dump" and "restore" which use the > > COPY method are actually totally useless for large databases. The reason > > for this is simple. Copying a 4 GB table with 40M rows requires over 40GB > > of temporary scratch space to copy, due to the WAL temp files. That > > sounds totally silly. Why doesn't pg_dump insert commits every 1000 rows > > or so??? > > Yup, "rant" is the correct term. The WAL file growth problem has been > fixed as of 7.1.3, so your premise is false. D'oh! OK, I'm off to upgrade... I'll just shut up, shall I... ;-) Thanks for the tip. :-) Regards. Gordan
On 12 Oct 2001, Doug McNaught wrote:
> Probably the best thing to do is to export the data from Progress in a
> format that the PostgreSQL COPY command can read.  See the docs for
> details.
Hi,
    I wrote a quick and dirty function/trigger to sync two DBs - one
local and the other on the web. The method is quite simple. Any insert or
update fires a trigger that "serialises" the entries and stores it in a
log with the table name, the primary key and the timestamp. When an entry
is deleted, the same happens except that the serialised column contains a
null. So when I sync, I just need to upload the changes and not the entire
dump. I think this is a good opportunity to get some advice feedback on
the code, so here it is:
--------------------------------------------------------------------------------------------
drop function setuptriggers();
create function setuptriggers() returns int as '
       declare
        fb        text;
        tb        text;
        tresult        record;
        cresult        record;
        pkeyname    name;
    begin
        for tresult in select * from pg_class
                  where relkind = ''r''
                  and   relname !~ ''^pg_''
                  and   relname !~ ''^Inv''
                  and   relname !~ ''^pga_''
                  order by relname
        loop
            select into pkeyname c.attname from pg_class a, pg_index b, pg_attribute c
            where a.relname = tresult.relname and
                  a.oid = b.indrelid and
                  a.oid = c.attrelid and
                  b.indkey[0] = c.attnum and
                  b.indisprimary=''t'';
            if pkeyname is not null and tresult.relname != ''logtable'' then
               fb := ''
                  create function logchange_'' || quote_ident(tresult.relname) || ''() returns opaque as ''''
                  declare
                     serialized text;
                     updatetime timestamp;
                     separator text;
                  begin
                     updatetime := ''''''''now'''''''';
                     separator := chr(178);
                     serialized := '''''''''''''''';
                  '';
                  for cresult in select * from pg_class a, pg_attribute b
                       where a.relname = tresult.relname and
                                       a.oid = b.attrelid and
                                       b.attnum > 0
                       order by b.attnum
                  loop
                fb := fb || ''   if NEW.'' || quote_ident(cresult.attname) || '' is not null then
                      serialized := serialized || separator || '''''''''' || quote_ident(cresult.attname) ||
''=''''''''||  NEW.'' || quote_ident(cresult.attname) || ''; 
                     end if;
                  '';
                  end loop;
                  fb := fb || ''   insert into logtable values (NEW.''|| quote_ident(pkeyname) || '', '''''''''' ||
quote_ident(tresult.relname)|| '''''''''', serialized, updatetime); 
                     return new;
                  end;''''
                  language ''''plpgsql'''';'';
                  execute fb;
                  tb := ''create trigger fireon_'' || quote_ident(tresult.relname) || '' before insert or update on ''
||quote_ident(tresult.relname) || '' 
                        for each row execute procedure logchange_'' || quote_ident(tresult.relname) || ''();'';
                  execute tb;
            end if;
        end loop;
        return 1;
    end;'
language 'plpgsql';
-------------------------------------------------------------------------------------------------
I hope this is usefule
			
		Your solution sounds very interesting (Not the throw away NT part...)...does anyone else have any input on this? Would it work well? Any idea as to what amount of traffic it would be capable of handling? If apache is only running in two instances, would that really keep the number of ODBC connections to 2 at max? By the way, I fully agree with throwing away NT -- unfortunately this DB is part of their inventory and POS system that another company set up for them. They just finished investing about $200K in the system, so just telling them to get rid of it isn't quite an answer. Thanks for all of the input from everyone. Joe Gordan Bobic wrote: > > On Fri, 12 Oct 2001, Joseph Koenig wrote: > > > I have a project where a client has products stored in a large Progress > > DB on an NT server. The web server is a FreeBSD box though, and the > > client wants to try to avoid the $5,500 license for the Unlimited > > Connections via OpenLink software and would like to take advantage of > > the 'free' non-expiring 2 connection (concurrent) license. This wouldn't > > be a huge problem, but the DB can easily reach 1 million records. Is > > there any good way to pull this data out of Progess and get it into > > Postgres? This is way too large of a db to do a "SELECT * FROM table" > > and do an insert for each row. Any brilliant ideas? Thanks, > > Yes. Throw away NT, and replace it with Linux or FreeBSD. They are free, > they are much more stable, and you will find that PostgreSQL on Linux is > likely to utterly blow away PostgreSQL on NT. IIRC, something to do with > the way NT handles forking and threads. Allegedly, this is particularly > noticeable in a heavy multi-user environment, where lots of connections > are spawned and killed. > > Another thing you could try - Apache persistent DBI connections using > mod_perl. Set up Apache to use mod_perl and the persistent DBI database > connections through it. Then limit the number of Apache instances to 2 > (i.e. up to 2 simultaneous connections). Then put an accelerating > squid proxy in front of the Apache web server, so that it can handle > static objects without wasting precious Apache back-ends. > > All this seems like a hugely complicated solution when you could just > replace NT with Linux or FreeBSD and have a cheaper and MUCH faster > sollution, not to mention more stable. > > Just my $0.02 > > Regards. > > Gordan > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
On Monday 15 Oct 2001 13:35, Joseph Koenig wrote: > Your solution sounds very interesting (Not the throw away NT > part...) That is where a signifficant part of the performance improvement would come from, if performance was what you were after... > ...does anyone else have any input on this? Would it work well? It works great - I am using it on a number of servers. It also gives you a much more secure setup, because you can pipe everything through a single firewall with a squid accelerating proxy. That means that a hacker would have to breach your firewall befure they would have even a remote opportunity to penetrate deeper into your network and attack the web server. > Any idea as to what amount of traffic it would be capable of handling? Any amount of traffic, it just means that only two connections would be served at any one time. If your query return and CGI page construction times are very low (which they should be if you are using mod_perl on a decent server), then you are unlikely to notice any slowdown. The excess hits would just get queued and processed when back-ends become available. It would also be a good idea to look through your scripts and attempt to connect the database in Perl::DBI as late as possible (not at the beginning of the script as a global variable, but just before the query is issued), and disconnect as early as possible (as soon as the query is executed and the data retrieved). This would not hog the connections to the PostgreSQL end as much. Of course, this is all provided you use perl CGIs and mod_perl on Apache. Otherwise, persistent connections don't come into the picture at all. > If apache is only running in two instances, would that really keep the > number of ODBC connections to 2 at max? No, that would only keep the number of your connections from the WEB SERVER to two max. ODBC connections from the rest of your network would be a completely separate issue. Basically, if Apache only ever has two instances running, then it can only ever serve up to 2 requests at any one time, and hence the database server will never receive more than 2 simultaneous request from the web server - UNLESS your CGIs make multiple database connections in parallel. If they do, then you're fighting a loosing battle, and you might as well give up. If your CGIs only ever use one connection, then putting the web server behind an accelerating squid proxy would actually help further, by not using the Apache back ends to serve static documents, such as frames pages, gif/jpg background or button images, etc. This means that only the actual CGI requests would go to Apache. I have implemented this sort of a load-reduction solution with a reduced number of active Apache servers in my last consultancy contract, and it worked very well. > By the way, I fully agree with throwing away NT -- Understandable... > unfortunately this DB is part of their inventory and > POS system that another company set up for them. They just finished > investing about $200K in the system, so just telling them to get rid of > it isn't quite an answer. Thanks for all of the input from everyone. Hang on - if they have just invested $200K into this "solution", then why exactly is an additional $5,500 for another licence a problem all of a sudden??? Regards. Gordan